How Does Service Discovery Work in Kubernetes?

intermediate|servicesdevopssreCKACKAD
TL;DR

Kubernetes provides two built-in service discovery mechanisms: DNS-based discovery via CoreDNS (the primary method) and environment variable injection. DNS creates records for every Service, enabling Pods to find other Services by name without hard-coding IP addresses.

Service Discovery in Kubernetes

When you have dozens or hundreds of microservices in a cluster, each needs a way to find and communicate with the others. Kubernetes provides two built-in mechanisms for this: DNS-based discovery and environment variable injection.

DNS-Based Discovery (Primary Method)

CoreDNS is deployed as a cluster add-on and is the default DNS server in Kubernetes. Every Service created in the cluster automatically gets DNS records.

A Records for Services

Every ClusterIP Service gets an A record:

<service-name>.<namespace>.svc.cluster.local  ->  <ClusterIP>

Example:

# From any Pod in the cluster
nslookup payment-service.production.svc.cluster.local
Name:    payment-service.production.svc.cluster.local
Address: 10.96.88.200

Short Names and Search Domains

The Pod's /etc/resolv.conf is configured with search domains, so you do not always need the full FQDN:

# /etc/resolv.conf inside a Pod in the 'production' namespace
nameserver 10.96.0.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

This means:

| Query | Resolves To | |---|---| | payment-service | Works within the same namespace | | payment-service.production | Works from any namespace | | payment-service.production.svc.cluster.local | Full FQDN, works everywhere |

SRV Records

Kubernetes also creates SRV records for named ports, following this pattern:

_<port-name>._<protocol>.<service>.<namespace>.svc.cluster.local

Example for a Service with a port named http:

nslookup -type=SRV _http._tcp.payment-service.production.svc.cluster.local
_http._tcp.payment-service.production.svc.cluster.local  SRV  0 100 80 payment-service.production.svc.cluster.local.

SRV records are useful for discovering both the hostname and port number dynamically.

Headless Service DNS

For headless Services (clusterIP: None), DNS returns A records for each individual Pod:

nslookup cassandra.default.svc.cluster.local
Address 1: 10.244.1.5 cassandra-0.cassandra.default.svc.cluster.local
Address 2: 10.244.2.8 cassandra-1.cassandra.default.svc.cluster.local
Address 3: 10.244.3.12 cassandra-2.cassandra.default.svc.cluster.local

ExternalName DNS

ExternalName Services return a CNAME record:

nslookup external-db.production.svc.cluster.local
external-db.production.svc.cluster.local  CNAME  mydb.us-east-1.rds.amazonaws.com

Environment Variable Discovery (Legacy Method)

When a Pod starts, Kubernetes injects environment variables for every Service that exists in the same namespace at that point in time:

# Environment variables for a service named "redis-master" on port 6379
REDIS_MASTER_SERVICE_HOST=10.96.0.11
REDIS_MASTER_SERVICE_PORT=6379
REDIS_MASTER_PORT=tcp://10.96.0.11:6379
REDIS_MASTER_PORT_6379_TCP=tcp://10.96.0.11:6379
REDIS_MASTER_PORT_6379_TCP_PROTO=tcp
REDIS_MASTER_PORT_6379_TCP_PORT=6379
REDIS_MASTER_PORT_6379_TCP_ADDR=10.96.0.11

Limitations of Environment Variables

  1. Order dependency -- The Service must exist before the Pod is created. If the Pod starts first, the variables are not injected.
  2. Static -- The values are set at Pod creation and never updated. If the Service's ClusterIP changes (e.g., after deletion and re-creation), the Pod has stale data.
  3. Namespace-scoped -- Only Services in the same namespace are injected.
  4. Environment pollution -- In clusters with many Services, the number of injected variables can become very large.

You can disable environment variable injection for a Pod:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  enableServiceLinks: false
  containers:
    - name: app
      image: myapp:1.0

CoreDNS Configuration

CoreDNS configuration is stored in a ConfigMap:

kubectl get configmap coredns -n kube-system -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
          lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
          pods insecure
          fallthrough in-addr.arpa ip6.arpa
          ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
          max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

Key settings:

| Directive | Purpose | |---|---| | kubernetes | Enables the Kubernetes plugin for Service/Pod DNS resolution | | forward | Forwards non-cluster queries to upstream DNS | | cache 30 | Caches responses for 30 seconds | | loadbalance | Randomizes the order of A records in responses |

Debugging DNS Issues

Step 1: Verify CoreDNS is Running

kubectl get pods -n kube-system -l k8s-app=kube-dns

Step 2: Test from a Debug Pod

apiVersion: v1
kind: Pod
metadata:
  name: dns-debug
spec:
  containers:
    - name: debug
      image: registry.k8s.io/e2e-test-images/jessie-dnsutils:1.3
      command: ["sleep", "3600"]
kubectl exec dns-debug -- nslookup payment-service.production.svc.cluster.local
kubectl exec dns-debug -- nslookup kubernetes.default.svc.cluster.local

Step 3: Check the ndots Setting

The default ndots:5 means any name with fewer than 5 dots triggers the search domain list. This can cause unnecessary DNS queries. For performance-sensitive applications, you can override it:

spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"

Step 4: Check CoreDNS Logs

kubectl logs -n kube-system -l k8s-app=kube-dns --tail=100

DNS vs. Environment Variables: When to Use Each

| Aspect | DNS | Environment Variables | |---|---|---| | Dynamic updates | Yes | No (static at Pod start) | | Cross-namespace | Yes | No | | Service creation order | Does not matter | Service must exist first | | Port discovery | SRV records | *_SERVICE_PORT vars | | Recommendation | Primary method | Legacy, avoid if possible |

Summary

Kubernetes service discovery is built on CoreDNS, which automatically registers DNS records for every Service. Pods can discover other Services by name without hard-coding IPs. Environment variable injection provides a legacy alternative but is static and limited. Understanding how DNS search domains, SRV records, and CoreDNS configuration work is essential for debugging connectivity issues in production clusters.

Why Interviewers Ask This

Service discovery is fundamental to microservice architecture. Interviewers want to see that candidates understand how Pods find each other and can troubleshoot DNS-related issues in production.

Common Follow-Up Questions

What is the fully qualified domain name (FQDN) format for a Kubernetes Service?
The format is <service-name>.<namespace>.svc.cluster.local. For example, api-server.production.svc.cluster.local.
What happens if CoreDNS is down?
New DNS lookups fail, so Pods cannot resolve Service names. Existing connections that already resolved the IP continue working. This makes CoreDNS a critical cluster component.
How do environment variables compare to DNS for service discovery?
Environment variables are injected at Pod creation time and are static. DNS is dynamic and reflects current state. DNS is the recommended approach because it handles Service changes automatically.

Key Takeaways

  • DNS-based discovery via CoreDNS is the primary and recommended service discovery mechanism.
  • Every Service gets A records, SRV records, and optionally CNAME records registered automatically.
  • Environment variables provide a legacy alternative but are static and order-dependent.