K8s no dns resolution on all used test images

Hi community and K8s experts,

I installed a clean K8s cluster based on virtual machines (Debian 10). After the installation and the integration into my landscape, I repaired in the first step the coreDNS resolution. I did further test’s and found the following. The test setup consisted of a google.com nslookup and a local pod lookup on a k8s DNS address.

Basic setup:

  • K8s version: 1.19.0
  • K8s setup: 1 master + 2 worker nodes
  • Based on: Debian 10 VM’s
  • CNI: Flannel

Status of CoreDNS Pods

kube-system            coredns-xxxx 1/1     Running   1          26h
kube-system            coredns-yyyy 1/1     Running   1          26h

CoreDNS Log:

.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7

CoreDNS config:

apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf
        cache 30
        loop
        reload
        loadbalance
    }
kind: ConfigMap
metadata:
  creationTimestamp: ""
  name: coredns
  namespace: kube-system
  resourceVersion: "219"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: xxx

CoreDNS Service

kubectl -n kube-system get svc -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE   SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   15d   k8s-app=kube-dns

Kubelet config yaml

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

Output of pods resolv.conf

/ # cat /etc/resolv.conf 
nameserver 10.96.0.10
search development.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5

Output of host resolv.conf

cat /etc/resolv.conf 
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 213.136.95.11
nameserver 213.136.95.10
search invalid

Output of host /run/flannel/subnet.env

cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

Test setup

kubectl exec -i -t busybox -n development -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default

Busybox v1.28 image

  • google.com nslookup works answer takes very long
  • local pod dns address fails answer takes very long

Test setup

kubectl exec -i -t dnsutils -- nslookup google.com
kubectl exec -i -t busybox -n development -- nslookup development.default

K8s dnsutils test image

  • google.com nslookup works sporadically It feels like sometimes the address is pulled from a cache and sometimes it does not work.
  • local pod dns address works sporadically It feels like sometimes the address is pulled from a cache and sometimes it does not work.

Test setup

kubectl exec -i -t dnsutilsalpine -n development -- nslookup google.com
kubectl exec -i -t dnsutilsalpine -n development -- nslookup development.default

Alpine image v3.12

  • google.com nslookup works sporadically It feels like sometimes the address is pulled from a cache and sometimes it does not work.
  • local pod dns address fails

The logs are empty. Do you have an idea where the problem is?

IP Routes master node

default via X.X.X.X dev eth0 onlink 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
X.X.X.X via X.X.X.X dev eth0 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

UPDATE

I reinstalled the cluster and now I use Calico as CNI and have the same problem.

UPDATE 2

After a detailed error analysis under Calico, I found out that the corresponding pods did not work properly. I analyzed the error in detail and could find out that the corresponding port 179 was not opened by me in the firewall. After fixing this error, I was able to determine the proper function of the pods and confirmed that now the resolution of the names is also working.

Answer

Unable to post that much via comments. Posting as an answer.

I checked the the guide you’ve been referring to and set up my own test cluster (GCP, 3xDebian10 VMs).

The difference is that in my ~/kube-cluster/master.yml I’ve set different link to kube-flannel.yml (and the content of that file differs from the file in the guide :))

$ grep http master.yml 
      shell: kubectl apply -f  https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml >> pod_network_setup.txt

On my cluster:

$ kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
instance-1   Ready    master   2m48s   v1.19.0
instance-2   Ready    <none>   38s     v1.19.0
instance-3   Ready    <none>   38s     v1.19.0

kubectl get pods -o wide -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
coredns-f9fd979d6-8sxg7              1/1     Running   0          4m48s   10.244.0.2    instance-1   <none>           <none>
coredns-f9fd979d6-z5gdl              1/1     Running   0          4m48s   10.244.0.3    instance-1   <none>           <none>

kube-flannel-ds-4khll                1/1     Running   0          2m58s   10.156.0.21   instance-3   <none>           <none>
kube-flannel-ds-h8d9l                1/1     Running   0          2m58s   10.156.0.20   instance-2   <none>           <none>
kube-flannel-ds-zhzbf                1/1     Running   0          4m49s   10.156.0.19   instance-1   <none>           <none>

$ kubectl -n kube-system get svc -o wide
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE     SELECTOR
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   6m15s   k8s-app=kube-dns

sammy@instance-1:~$ ip route
default via 10.156.0.1 dev ens4 
10.156.0.1 dev ens4 scope link 
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1 
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink 
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink 
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown


I see no DNS lag issues.

kubectl create deployment busybox --image=nkolchenko/enea:server_go_latest
deployment.apps/busybox created

sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup default.default
Server:         10.96.0.10
Address:        10.96.0.10:53

** server can't find default.default: NXDOMAIN

** server can't find default.default: NXDOMAIN

command terminated with exit code 1

real    0m0.227s
user    0m0.106s
sys     0m0.012s


sammy@instance-1:~$ time kubectl exec -it busybox-6f744547bf-hkxnk -- nslookup google.com
Server:         10.96.0.10
Address:        10.96.0.10:53

Non-authoritative answer:
Name:   google.com
Address: 172.217.22.78

Non-authoritative answer:
Name:   google.com
Address: 2a00:1450:4001:820::200e


real    0m0.223s
user    0m0.102s
sys     0m0.012s

Let me know if you need me to run any other tests, I’ll keep this cluster throughout the weekend and then tear it down.

UPDATE:

$ cat ololo 
apiVersion: v1
kind: Pod
metadata:
  name: dnsutils
  namespace: default
spec:
  containers:
  - name: dnsutils
    image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
    command:
      - sleep
      - "3600"
    imagePullPolicy: IfNotPresent
  restartPolicy: Always

$ kubectl create -f ololo 
pod/dnsutils created


$ kubectl get -A all  -o wide | grep dns
default       pod/dnsutils                             1/1     Running   0          63s     10.244.2.8    instance-2   <none>           <none>
kube-system   pod/coredns-cc8845745-jtvlh              1/1     Running   0          10m     10.244.1.3    instance-3   <none>           <none>
kube-system   pod/coredns-cc8845745-xxh28              1/1     Running   0          10m     10.244.0.4    instance-1   <none>           <none>
kube-system   pod/coredns-cc8845745-zlv84              1/1     Running   0          10m     10.244.2.6    instance-2   <none>           <none>

instance-1:~$ kubectl exec -i -t dnsutils -- time nslookup google.com
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   google.com
Address: 172.217.21.206
Name:   google.com
Address: 2a00:1450:4001:818::200e

real    0m 0.01s
user    0m 0.00s
sys     0m 0.00s




Attribution
Source : Link , Question Author : ZPascal , Answer Author : Nick

Leave a Comment