What causes DNS Resolution Failure in Kubernetes?

Kubernetes DNS Resolution Failure

Causes and Fixes

DNS resolution failures in Kubernetes occur when pods cannot resolve service names, external hostnames, or cluster DNS entries. This typically manifests as connection errors mentioning 'name resolution failed' or 'no such host' and is most commonly caused by issues with CoreDNS or the pod's DNS configuration.

Symptoms

Application logs show 'could not resolve host' or 'name resolution failed'
nslookup or dig commands inside pods return SERVFAIL or timeout
Pods can reach IPs directly but not hostnames
Service-to-service communication fails while pod-to-pod by IP works
CoreDNS pods are crashing, restarting, or in error state

Common Causes

CoreDNS pods are not running

The CoreDNS Deployment in kube-system is scaled to zero, crashing, or not yet deployed. Without functioning DNS pods, no name resolution works within the cluster.

CoreDNS is overloaded

The DNS pods are receiving more queries than they can handle, causing timeouts. This is common in large clusters or when applications generate excessive DNS traffic.

Network policy blocking DNS traffic

A NetworkPolicy is blocking egress to CoreDNS on UDP/TCP port 53, preventing pods from reaching the DNS service.

Incorrect resolv.conf in pods

The pod's /etc/resolv.conf has wrong nameserver entries, search domains, or ndots configuration, causing queries to be sent to the wrong DNS server.

Upstream DNS misconfiguration

CoreDNS is configured to forward external queries to an upstream resolver that is unreachable or misconfigured, causing external name resolution to fail.

kube-dns Service IP mismatch

The kube-dns ClusterIP Service does not match the DNS server IP configured in the kubelet's --cluster-dns flag, so pods point to a nonexistent DNS endpoint.

Step-by-Step Troubleshooting

DNS resolution is fundamental to Kubernetes networking — services discover each other by name, and applications rely on DNS for both internal and external communication. When DNS breaks, nearly everything breaks. This guide systematically diagnoses DNS failures from the pod level up through CoreDNS and upstream resolvers.

1. Confirm the DNS Failure

First, verify that DNS is actually failing from inside a pod. Launch a debug pod with DNS tools.

kubectl run dns-debug --image=busybox:1.36 --restart=Never -- sleep 3600
kubectl exec dns-debug -- nslookup kubernetes.default

If this times out or returns SERVFAIL, DNS is broken. Try a few different queries to narrow down the scope.

# Test cluster service resolution
kubectl exec dns-debug -- nslookup kubernetes.default.svc.cluster.local

# Test external resolution
kubectl exec dns-debug -- nslookup google.com

# Test with a specific DNS server (the kube-dns ClusterIP)
kubectl exec dns-debug -- nslookup kubernetes.default 10.96.0.10

If cluster names fail but external names work (or vice versa), the issue is more specific. If everything fails, the problem is likely with CoreDNS availability or network connectivity to it.

2. Check CoreDNS Pod Health

CoreDNS handles all cluster DNS. Verify its pods are running.

kubectl get pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS deployment
kubectl get deployment -n kube-system coredns

# Check logs for errors
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

Common log errors include:

HINFO: read udp ... i/o timeout — upstream DNS is unreachable
plugin/loop: Loop ... detected — DNS loop causing CoreDNS to crash
connection refused — upstream resolver is down

If CoreDNS pods are in CrashLoopBackOff, check for the loop detection issue.

kubectl logs -n kube-system <coredns-pod> --previous | grep -i loop

3. Check the kube-dns Service

The kube-dns Service provides the ClusterIP that pods use as their DNS server.

kubectl get service kube-dns -n kube-system

# Verify endpoints exist
kubectl get endpoints kube-dns -n kube-system

The service should have endpoints pointing to the CoreDNS pod IPs. If the endpoints list is empty, CoreDNS pods are not ready, and DNS queries have nowhere to go.

4. Check the Pod's DNS Configuration

Verify what DNS configuration the affected pod is using.

kubectl exec <pod-name> -- cat /etc/resolv.conf

A typical resolv.conf looks like:

nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

Verify that:

The nameserver IP matches the kube-dns Service ClusterIP
The search domains are correct for the pod's namespace
The ndots value is appropriate (default is 5)

If the resolv.conf is wrong, check the kubelet configuration.

# On the node, check kubelet's DNS settings
ps aux | grep kubelet | grep -oP 'cluster-dns[= ]\S+'

5. Check CoreDNS Configuration (Corefile)

The CoreDNS configuration may have errors.

kubectl get configmap coredns -n kube-system -o yaml

Verify the Corefile has the correct structure. Key things to check:

The kubernetes plugin is configured with the correct cluster domain (usually cluster.local)
The forward directive points to valid upstream DNS servers
There are no syntax errors

# Common working Corefile structure
# .:53 {
#     kubernetes cluster.local in-addr.arpa ip6.arpa {
#         pods insecure
#         fallthrough in-addr.arpa ip6.arpa
#     }
#     forward . /etc/resolv.conf
#     cache 30
#     loop
#     reload
#     loadbalance
# }

If CoreDNS is configured to forward to /etc/resolv.conf on the node, and the node's resolv.conf points back to the cluster DNS (creating a loop), CoreDNS will detect the loop and crash. Fix this by pointing forward to explicit upstream DNS servers.

# Edit the ConfigMap to fix the forward target
kubectl edit configmap coredns -n kube-system
# Change: forward . /etc/resolv.conf
# To:     forward . 8.8.8.8 8.8.4.4

6. Test Network Connectivity to CoreDNS

Even if CoreDNS is running, network issues can prevent pods from reaching it.

# Get the CoreDNS pod IPs
kubectl get pods -n kube-system -l k8s-app=kube-dns -o wide

# Test direct connectivity from a debug pod
kubectl exec dns-debug -- nc -uzv <coredns-pod-ip> 53

# Test connectivity to the kube-dns service IP
kubectl exec dns-debug -- nc -uzv 10.96.0.10 53

If you cannot reach CoreDNS, check for NetworkPolicies that might be blocking DNS traffic.

# List NetworkPolicies in the pod's namespace
kubectl get networkpolicy -n <namespace>

# Check if any policy blocks egress to kube-system or port 53
kubectl describe networkpolicy -n <namespace>

If a NetworkPolicy blocks DNS, add an egress rule allowing UDP and TCP port 53 to the kube-dns service.

7. Check for DNS Rate Limiting or Overload

In high-traffic clusters, CoreDNS can become a bottleneck.

# Check CoreDNS resource usage
kubectl top pods -n kube-system -l k8s-app=kube-dns

# Check CoreDNS metrics if available
kubectl exec -n kube-system <coredns-pod> -- sh -c 'wget -qO- http://localhost:9153/metrics' | grep coredns_dns_requests_total

If CoreDNS is CPU-throttled or memory-constrained, scale it up.

# Scale CoreDNS
kubectl scale deployment coredns -n kube-system --replicas=4

# Or adjust resource limits
kubectl set resources deployment coredns -n kube-system --requests=cpu=200m,memory=256Mi --limits=cpu=500m,memory=512Mi

For large clusters, consider deploying NodeLocal DNSCache to offload queries from CoreDNS and reduce cross-node DNS traffic.

8. Fix the ndots Issue

The default ndots:5 setting means any name with fewer than 5 dots will have each search domain appended before trying the absolute name. This causes excessive DNS queries for external names.

# A lookup for "api.example.com" generates these queries:
# api.example.com.default.svc.cluster.local
# api.example.com.svc.cluster.local
# api.example.com.cluster.local
# api.example.com  (finally the actual query)

If this is causing performance issues, set ndots to a lower value in your pod spec.

spec:
  dnsConfig:
    options:
    - name: ndots
      value: "2"

9. Restart CoreDNS

If configuration changes were made, restart CoreDNS to pick them up.

kubectl rollout restart deployment coredns -n kube-system
kubectl rollout status deployment coredns -n kube-system

10. Verify DNS Is Working

After fixing the issue, confirm DNS resolution works end-to-end.

# Test cluster DNS
kubectl exec dns-debug -- nslookup kubernetes.default.svc.cluster.local

# Test cross-namespace service resolution
kubectl exec dns-debug -- nslookup <service-name>.<namespace>.svc.cluster.local

# Test external DNS
kubectl exec dns-debug -- nslookup google.com

# Clean up debug pod
kubectl delete pod dns-debug

All queries should return valid responses with IP addresses. If cluster DNS works but external DNS does not, the issue is with the upstream forwarder configuration. If both work, DNS resolution is fully restored.

How to Explain This in an Interview

I would explain how Kubernetes DNS works end-to-end: the kubelet configures each pod's /etc/resolv.conf to point at the kube-dns ClusterIP, CoreDNS watches the API server for Service and Endpoint changes, and queries are resolved using search domains that allow short names like 'myservice' to resolve within the same namespace. I'd discuss the ndots setting (default 5) and how it affects query behavior, the role of the kubernetes plugin in CoreDNS's Corefile, and how to debug DNS at each layer — from the pod's resolv.conf to CoreDNS logs to upstream resolvers. I'd also mention NodeLocal DNSCache as a solution for DNS scalability.

Prevention

Deploy CoreDNS with at least 2 replicas and a PodDisruptionBudget
Monitor CoreDNS metrics (cache hits, errors, latency) with Prometheus
Use NodeLocal DNSCache to reduce CoreDNS load and improve latency
Configure appropriate ndots value to reduce unnecessary DNS queries
Ensure NetworkPolicies always allow egress to kube-dns