How Do You Debug DNS Issues in Kubernetes?

intermediate|dnsdevopssrebackend developerCKACKAD
TL;DR

Debug Kubernetes DNS by checking CoreDNS Pod health, verifying resolv.conf configuration, testing lookups from within Pods using nslookup or dig, inspecting CoreDNS logs, and validating that the kube-dns Service and endpoints exist. Common issues include CoreDNS crashes, misconfigured network policies blocking DNS, and ndots settings causing slow lookups.

Detailed Answer

DNS issues are among the most common problems in Kubernetes clusters. A systematic debugging approach can resolve most issues quickly.

Step 1: Check CoreDNS Health

# Are CoreDNS Pods running?
kubectl get pods -n kube-system -l k8s-app=kube-dns
# NAME                       READY   STATUS    RESTARTS   AGE
# coredns-5d78c9869d-abc12   1/1     Running   0          5d
# coredns-5d78c9869d-def34   1/1     Running   0          5d

# Check for crashes or restarts
kubectl describe pod -n kube-system -l k8s-app=kube-dns

# View CoreDNS logs
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

Common CoreDNS issues:

  • CrashLoopBackOff: Usually a Corefile syntax error
  • OOMKilled: Increase memory limits
  • High restart count: Check for forwarding loops

Step 2: Verify the kube-dns Service

# Check the Service exists and has endpoints
kubectl get service kube-dns -n kube-system
# NAME       TYPE        CLUSTER-IP   PORT(S)
# kube-dns   ClusterIP   10.96.0.10   53/UDP,53/TCP,9153/TCP

kubectl get endpoints kube-dns -n kube-system
# NAME       ENDPOINTS
# kube-dns   10.244.0.5:53,10.244.0.6:53  ← Must have endpoints

If endpoints are empty, CoreDNS Pods are not Running/Ready.

Step 3: Test DNS from a Pod

# Launch a debug Pod
kubectl run dns-debug --rm -it --image=busybox:1.36 -- sh

# Test cluster DNS
nslookup kubernetes.default.svc.cluster.local
# Server:    10.96.0.10
# Address:   10.96.0.10:53
# Name:      kubernetes.default.svc.cluster.local
# Address:   10.96.0.1

# Test a Service in your namespace
nslookup api-service.default.svc.cluster.local

# Test external DNS
nslookup google.com

# Use dig for more detail
kubectl run dns-debug --rm -it --image=tutum/dnsutils -- dig api-service.default.svc.cluster.local

Step 4: Check Pod resolv.conf

kubectl exec my-pod -- cat /etc/resolv.conf
# nameserver 10.96.0.10
# search default.svc.cluster.local svc.cluster.local cluster.local
# options ndots:5

Verify:

  • nameserver matches the kube-dns Service ClusterIP
  • search includes your namespace
  • ndots is set (default 5)

Common Issues and Solutions

Issue: CoreDNS Forwarding Loop

Symptom: CoreDNS repeatedly crashes with "Loop detected" in logs.

Cause: CoreDNS forwards to itself via the node's resolv.conf.

Fix: Configure forward to use explicit upstream DNS:

forward . 8.8.8.8 8.8.4.4 {
    max_concurrent 1000
}

Issue: Network Policy Blocking DNS

Symptom: Pods cannot resolve any DNS names after applying network policies.

Fix: Allow egress to CoreDNS:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53

Issue: Slow External DNS Resolution

Symptom: External names like api.stripe.com take several seconds to resolve.

Cause: ndots:5 causes multiple failed lookups with search domains before the real lookup.

Fix: Use FQDNs with trailing dots or reduce ndots:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Issue: Service Not Resolving

Symptom: nslookup api-service returns NXDOMAIN.

Debugging steps:

# 1. Does the Service exist?
kubectl get service api-service
# If not found, that's the problem

# 2. Try the fully qualified name
nslookup api-service.default.svc.cluster.local
# If this works but the short name doesn't, check search domains

# 3. Is the Service in a different namespace?
nslookup api-service.backend.svc.cluster.local

# 4. Does the Service have endpoints?
kubectl get endpoints api-service
# If empty, no Pods match the Service selector

Issue: CoreDNS Out of Memory

Symptom: CoreDNS Pods are OOMKilled.

Fix: Increase memory limits:

kubectl edit deployment coredns -n kube-system
# Increase resources.limits.memory

Or deploy NodeLocal DNSCache to reduce load on CoreDNS:

kubectl apply -f nodelocaldns.yaml

DNS Debugging Toolkit

# Quick DNS test
kubectl run dns-test --rm -it --image=busybox -- nslookup kubernetes

# Full DNS debugging toolkit
kubectl run dns-debug --rm -it --image=nicolaka/netshoot -- bash
# Then use: dig, nslookup, host, drill

# Check CoreDNS metrics
kubectl port-forward -n kube-system svc/kube-dns 9153:9153 &
curl -s http://localhost:9153/metrics | grep coredns_dns_responses_total

Why Interviewers Ask This

Interviewers ask this because DNS issues are one of the most common sources of connectivity problems in Kubernetes, and debugging them efficiently is a critical operational skill.

Common Follow-Up Questions

What is the most common cause of DNS failures in Kubernetes?
CoreDNS Pods being unhealthy (CrashLoopBackOff, OOMKilled) or network policies blocking UDP port 53 to the kube-system namespace.
Why do external DNS lookups sometimes feel slow?
The ndots:5 setting causes names like google.com to be tried with cluster search domains first (4 failed lookups before the real one). Reducing ndots or using FQDNs (google.com.) fixes this.
How do you test DNS from outside a Pod?
Run a temporary debug Pod: kubectl run dns-test --rm -it --image=busybox -- nslookup api-service

Key Takeaways

  • Always check CoreDNS Pod health first — most DNS issues stem from unhealthy CoreDNS.
  • Use nslookup or dig from within a Pod to test DNS resolution from the Pod's perspective.
  • Network policies blocking UDP 53 to kube-system are a common cause of DNS failures.

Related Questions

You Might Also Like