What is etcd and what role does it play in Kubernetes?
etcd is a distributed, strongly consistent key-value store that serves as the backing store for all Kubernetes cluster data. Every object, configuration, and piece of state in the cluster is persisted in etcd, making it the single source of truth.
Detailed Answer
etcd is an open-source, distributed key-value store developed by CoreOS (now part of Red Hat). In Kubernetes, it serves as the persistent storage backend for all cluster data. When you create a Deployment, a Service, a ConfigMap, or any other Kubernetes object, it is serialized and stored in etcd. When you query the API server, it reads from etcd (or its watch cache) to return the current state.
How Kubernetes Uses etcd
The kube-apiserver is the only Kubernetes component that communicates directly with etcd. All other components (scheduler, controller manager, kubelet) interact with cluster state exclusively through the API server. This design provides a single point of access control and ensures consistent serialization of data.
Data in etcd is organized under a key prefix, typically /registry/. For example:
/registry/pods/default/my-pod-- A pod named "my-pod" in the default namespace/registry/deployments/production/web-app-- A Deployment in the production namespace/registry/services/kube-system/kube-dns-- The kube-dns Service
Raft Consensus
etcd uses the Raft consensus algorithm to replicate data across all members of the cluster. Raft ensures that as long as a majority (quorum) of members are available, the cluster can accept writes. For a 3-member etcd cluster, it can tolerate 1 failure. For 5 members, it can tolerate 2 failures.
| Cluster Size | Quorum | Failure Tolerance | |-------------|--------|-------------------| | 1 | 1 | 0 | | 3 | 2 | 1 | | 5 | 3 | 2 | | 7 | 4 | 3 |
Backup and Restore
Backing up etcd is the single most important disaster recovery procedure in Kubernetes:
# Create a snapshot backup
ETCDCTL_API=3 etcdctl snapshot save /backup/etcd-snapshot.db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify the snapshot
ETCDCTL_API=3 etcdctl snapshot status /backup/etcd-snapshot.db --write-table
# Restore from a snapshot (stop the API server first)
ETCDCTL_API=3 etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir=/var/lib/etcd-restored \
--initial-cluster=controlplane=https://10.0.0.10:2380 \
--initial-advertise-peer-urls=https://10.0.0.10:2380 \
--name=controlplane
Monitoring etcd Health
# Check etcd cluster health
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# List etcd members
ETCDCTL_API=3 etcdctl member list \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key \
--write-table
Inspecting Data Stored in etcd
While you should not directly modify data in etcd, inspecting it can be useful for debugging:
# List all keys under /registry
ETCDCTL_API=3 etcdctl get /registry --prefix --keys-only | head -20
# Read a specific key (output is protobuf-encoded)
ETCDCTL_API=3 etcdctl get /registry/pods/default/my-pod
Performance Tuning
etcd performance directly impacts the responsiveness of the entire cluster. Key considerations:
- Storage: Use fast SSDs with low-latency I/O. etcd uses a write-ahead log (WAL) and performs periodic fsync operations. If
wal_fsync_duration_secondsconsistently exceeds 10ms, disk I/O is a bottleneck. - Network: etcd members communicate over gRPC. Network latency between members should be under 10ms for reliable operation.
- Compaction: etcd stores all revisions of every key. Over time, this grows. Kubernetes configures automatic compaction, but you should verify it is working by monitoring
etcd_db_total_size_in_bytes. - Defragmentation: After compaction frees space logically, defragmentation reclaims it on disk. Schedule periodic defragmentation during maintenance windows.
etcd in the Static Pod Manifest
On kubeadm clusters, etcd runs as a static pod:
# /etc/kubernetes/manifests/etcd.yaml (excerpt)
spec:
containers:
- command:
- etcd
- --data-dir=/var/lib/etcd
- --listen-client-urls=https://127.0.0.1:2379,https://10.0.0.10:2379
- --advertise-client-urls=https://10.0.0.10:2379
- --listen-peer-urls=https://10.0.0.10:2380
- --initial-advertise-peer-urls=https://10.0.0.10:2380
- --cert-file=/etc/kubernetes/pki/etcd/server.crt
- --key-file=/etc/kubernetes/pki/etcd/server.key
- --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
volumeMounts:
- mountPath: /var/lib/etcd
name: etcd-data
The /var/lib/etcd directory holds the actual database and WAL files. Losing this directory without a backup means losing all cluster state.
Why Interviewers Ask This
Interviewers ask about etcd to assess whether a candidate understands where cluster state lives and the implications for backup, disaster recovery, and high availability. Misunderstanding etcd often leads to data loss scenarios in production.
Common Follow-Up Questions
Key Takeaways
- etcd is the only stateful component in the control plane and requires careful operational attention
- It uses the Raft consensus algorithm requiring a majority of members to be available for writes
- Regular backups of etcd are non-negotiable in production environments