Skip to content

running dns query timeout on kubespray-2.27 #12149

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bgkim0420 opened this issue Apr 23, 2025 · 4 comments
Closed

running dns query timeout on kubespray-2.27 #12149

bgkim0420 opened this issue Apr 23, 2025 · 4 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. RHEL 9

Comments

@bgkim0420
Copy link

What happened?

After upgrading from Kubernetes version 1.29 to 1.31, DNS resolving is timeouts. This issue is not limited to specific pods; it occurs in all pods running on version 1.31, and it did not happen in version 1.29.

In version 1.29, I performed DNS queries using the following command:

$> kubectl exec -it grafana-server-76cf4f8d7d-r4sgx  -n grafana -- nslookup google.com
Server:         10.233.0.3
Address:        10.233.0.3:53

Non-authoritative answer:
Name:   google.com
Address: 142.250.76.142

*** Can't find google.com: No answer

When running nslookup in version 1.31, the output is as follows:

$> kubectl exec -it -n grafana grafana-server-6f8d7bfdb6-jddwk -- nslookup google.com
;; Got recursion not available from 10.233.0.3
Server:         10.233.0.3
Address:        10.233.0.3#53

;; communications error to 10.233.0.3#53: timed out

What did you expect to happen?

DNS queries normal response.

How can we reproduce it (as minimally and precisely as possible)?

$> kubectl exec -it -n grafana grafana-server-6f8d7bfdb6-jddwk -- nslookup google.com
;; Got recursion not available from 10.233.0.3
Server:         10.233.0.3
Address:        10.233.0.3#53

;; communications error to 10.233.0.3#53: timed out

OS

RHEL 9

Version of Ansible

3.13.0

Version of Python

Python 3.10.12

Version of Kubespray (commit)

release-2.27

Network plugin used

calico

Full inventory with variables

[all]
test-master1 ansible_host=192.168.0.100
test-worker1 ansible_host=192.168.0.101

[kube_control_plane]
test-master1

[etcd]
test-master1

[kube_node]
test-master1
test-worker1

[calico_rr]

[ingress]
test-worker1
test-master1

[test_cluster:children]
kube_control_plane
kube_node
calico_rr

[ingress:vars]
node_labels={"node-role.kubernetes.io/ingress":"true"}

Command used to invoke ansible

ansible-playbook -i ./inventory/k8stest/inventory.ini cluster.yml

Output of ansible run

PLAY RECAP *********************************************************************
test-master1 : ok=728 changed=127 unreachable=0 failed=0 skipped=1075 rescued=0 ignored=6
test-worker1 : ok=447 changed=74 unreachable=0 failed=0 skipped=644 rescued=0 ignored=1

Wednesday 23 April 2025 02:00:12 +0000 (0:00:00.089) 0:20:31.516 *******

download : Download_container | Download image if required ------------ 138.48s
download : Download_container | Download image if required ------------- 97.96s
container-engine/cri-o : Download_file | Download item ----------------- 88.70s
download : Download_container | Download image if required ------------- 78.13s
download : Download_container | Download image if required ------------- 67.24s
download : Download_container | Download image if required ------------- 58.12s
download : Download_container | Download image if required ------------- 54.57s
download : Download_container | Download image if required ------------- 48.70s
download : Download_container | Download image if required ------------- 43.75s
download : Download_container | Download image if required ------------- 40.30s
container-engine/skopeo : Download_file | Download item ---------------- 39.05s
kubernetes/kubeadm : Join to cluster if needed ------------------------- 20.65s
download : Download_container | Download image if required ------------- 18.77s
network_plugin/calico : Wait for calico kubeconfig to be created ------- 14.10s
kubernetes/preinstall : Preinstall | restart kube-apiserver crio/containerd -- 10.89s
kubernetes-apps/helm : Helm | Install PyYaml --------------------------- 10.62s
kubernetes/control-plane : Kubeadm | Initialize first control plane node -- 10.21s
network_plugin/calico : Calico | Create Calico ipam manifests ----------- 8.48s
container-engine/crun : Download_file | Download item ------------------- 7.51s
download : Download_file | Download item -------------------------------- 6.71s

Anything else we need to know

kubernetes/kubernetes#131396 same issue

@bgkim0420 bgkim0420 added the kind/bug Categorizes issue or PR as related to a bug. label Apr 23, 2025
@vdveldet
Copy link

Most likely this is related to network issue, you can add logging to the coredns to have more info as described here

@bgkim0420
Copy link
Author

Most likely this is related to network issue, you can add logging to the coredns to have more info as described here

i checked 1.29 and 1.31 calico status

1.29 BGPconfiguration check

$>./calicoctl.sh get bgpConfiguration
NAME      LOGSEVERITY   MESHENABLED   ASNUMBER
default   Info          true          64512

 $>./calicoctl.sh  get ippool -o wide
NAME           CIDR             NAT    IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR
default-pool   10.233.64.0/18   true   Never      Always      false      false              all()

1.31

$> ./calicoctl.sh get bgpconfiguration
NAME      LOGSEVERITY   MESHENABLED   ASNUMBER
default   Info          true          64512

$> ./calicoctl.sh get ippool -o wide
NAME           CIDR             NAT    IPIPMODE   VXLANMODE   DISABLED   DISABLEBGPEXPORT   SELECTOR
default-pool   10.243.64.0/18   true   Never      Always      false      false              all()

@bgkim0420
Copy link
Author

The issue was resolved after redeploying with Calico downgraded from version 3.29.1 to 3.27.4. Could there be any issues suspected with Calico?

@bgkim0420
Copy link
Author

t was confirmed that the issue was caused by the checksum offload feature in Calico 3.28. After disabling this feature, normal communication was restored. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. RHEL 9
Projects
None yet
Development

No branches or pull requests

2 participants