How do you set up a highly available Kubernetes cluster?
A highly available Kubernetes cluster requires multiple control plane nodes (minimum 3) with a load balancer in front of the API servers, an etcd cluster with odd-numbered members for quorum, leader election for the scheduler and controller manager, and worker nodes spread across failure domains.
Detailed Answer
A production Kubernetes cluster must tolerate component failures without downtime. High availability (HA) is achieved by running redundant instances of every control plane component and distributing them across failure domains (availability zones, racks, or data centers).
HA Architecture Overview
Load Balancer (L4/TCP)
| | |
+-------+ +-------+ +-------+
| CP-1 | | CP-2 | | CP-3 |
| api | | api | | api |
| sched | | sched | | sched |
| cm | | cm | | cm |
| etcd | | etcd | | etcd |
+-------+ +-------+ +-------+
AZ-1 AZ-2 AZ-3
| | |
+------+------+------+------+------+
| W-1 | W-2 | W-3 | W-4 | W-5 |
+------+------+------+------+------+
Component-Level HA
kube-apiserver -- The API server is stateless. Multiple instances run simultaneously behind a load balancer. All instances are active and serve requests in parallel. A layer-4 (TCP) load balancer distributes traffic across all healthy API server endpoints.
# kubeadm HA setup with a load balancer endpoint
kubeadm init \
--control-plane-endpoint "api.k8s.example.com:6443" \
--upload-certs
# Join additional control plane nodes
kubeadm join api.k8s.example.com:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <key>
etcd -- Requires an odd number of members (3 or 5) for quorum. Uses Raft consensus where a majority must agree on writes. In a 3-member cluster, 1 failure is tolerated. In a 5-member cluster, 2 failures are tolerated.
kube-scheduler and kube-controller-manager -- Use leader election to ensure only one active instance at a time. The others remain on standby.
# Verify leader election leases
kubectl get leases -n kube-system
# Example output:
# NAME HOLDER AGE
# kube-controller-manager cp-1_abc123-def456 5d
# kube-scheduler cp-2_ghi789-jkl012 5d
Stacked vs. External etcd
Stacked etcd topology -- etcd runs on the same nodes as other control plane components. This is simpler to set up and requires fewer machines, but a node failure loses both a control plane member and an etcd member simultaneously.
External etcd topology -- etcd runs on dedicated nodes separate from the Kubernetes control plane. This provides better fault isolation and allows independent scaling, but requires more infrastructure.
# kubeadm config for external etcd
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
etcd:
external:
endpoints:
- https://etcd-1.example.com:2379
- https://etcd-2.example.com:2379
- https://etcd-3.example.com:2379
caFile: /etc/kubernetes/pki/etcd/ca.crt
certFile: /etc/kubernetes/pki/apiserver-etcd-client.crt
keyFile: /etc/kubernetes/pki/apiserver-etcd-client.key
Load Balancer Configuration
The load balancer must perform TCP (layer 4) load balancing to the API servers. Common choices include HAProxy, nginx, and cloud provider load balancers:
# Example HAProxy configuration for API server HA
frontend k8s-api
bind *:6443
mode tcp
default_backend k8s-api-backend
backend k8s-api-backend
mode tcp
balance roundrobin
option tcp-check
server cp-1 10.0.1.10:6443 check fall 3 rise 2
server cp-2 10.0.2.10:6443 check fall 3 rise 2
server cp-3 10.0.3.10:6443 check fall 3 rise 2
Worker Node HA
Worker nodes should be spread across failure domains using topology spread constraints:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
containers:
- name: web
image: nginx:1.27
resources:
requests:
cpu: "250m"
memory: "128Mi"
Pod Disruption Budgets
PDBs protect applications during voluntary disruptions (node drains, upgrades):
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-pdb
spec:
minAvailable: "50%"
selector:
matchLabels:
app: web
Validating HA Setup
# Check all control plane components are running
kubectl get pods -n kube-system -l tier=control-plane -o wide
# Verify etcd cluster health
ETCDCTL_API=3 etcdctl endpoint health \
--endpoints=https://10.0.1.10:2379,https://10.0.2.10:2379,https://10.0.3.10:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# Verify leader election is working
kubectl get lease -n kube-system kube-controller-manager -o jsonpath='{.spec.holderIdentity}'
kubectl get lease -n kube-system kube-scheduler -o jsonpath='{.spec.holderIdentity}'
# Check node distribution across zones
kubectl get nodes -L topology.kubernetes.io/zone
# Simulate a control plane failure and verify cluster continues operating
# (do this in a test environment!)
Managed Kubernetes HA
Cloud providers simplify HA significantly. With EKS, GKE, or AKS, the control plane is fully managed and distributed across availability zones automatically. Your responsibility is ensuring worker nodes are spread across zones using node groups or node pools configured for multiple AZs.
Why Interviewers Ask This
HA architecture questions reveal whether a candidate can design production-grade clusters. Interviewers evaluate understanding of failure modes, quorum requirements, load balancing strategies, and the trade-offs between stacked and external etcd topologies.
Common Follow-Up Questions
Key Takeaways
- HA requires redundancy at every layer: API server, etcd, scheduler, controller manager, and worker nodes
- etcd quorum (majority of members) is the critical factor; losing quorum means losing the ability to write cluster state
- A load balancer in front of API server instances is essential for transparent failover