What Are Topology Spread Constraints in Kubernetes?
Topology spread constraints control how Pods are distributed across topology domains (zones, nodes, racks). Unlike pod anti-affinity which is binary, topology spread uses maxSkew to define how unevenly Pods can be distributed. This enables fine-grained, even workload distribution for high availability.
Detailed Answer
Why Topology Spread Constraints?
Pod anti-affinity can prevent two Pods from sharing the same node or zone, but it cannot control the distribution ratio. If you have 6 replicas across 3 zones, anti-affinity might place 4 in zone-a and 1 each in zone-b and zone-c. Topology spread constraints ensure an even 2-2-2 distribution.
The Constraint Fields
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
| Field | Purpose |
|---|---|
| maxSkew | Maximum allowed difference in Pod count between any two domains |
| topologyKey | Node label defining the topology domain |
| whenUnsatisfiable | DoNotSchedule (hard) or ScheduleAnyway (soft) |
| labelSelector | Which Pods to count when calculating the skew |
| matchLabelKeys | (1.27+) Use Pod label values to scope the constraint |
| minDomains | Minimum number of eligible domains required |
How maxSkew Works
Suppose you have 3 zones and the current Pod distribution for app: web is:
zone-a: 2 Pods
zone-b: 2 Pods
zone-c: 1 Pod
With maxSkew: 1, the next Pod must go to zone-c (bringing it to 2-2-2). Placing it in zone-a or zone-b would create a skew of 2 (3-1=2), which violates maxSkew.
With maxSkew: 2, the Pod can go to any zone because the worst case would be 3-2-1 (skew of 2), which equals the maximum allowed.
Basic Zone Spreading
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
containers:
- name: web
image: nginx:1.25
This distributes 6 replicas evenly: 2 per zone across 3 zones.
Multiple Constraints
You can combine constraints to spread across both zones and nodes:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api
spec:
replicas: 12
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api
containers:
- name: api
image: api-server:v3
The first constraint (hard) ensures even zone distribution. The second constraint (soft) tries to spread evenly across nodes within each zone but allows imbalance if needed.
whenUnsatisfiable Options
DoNotSchedule (Hard)
The Pod will not be scheduled if it would violate the maxSkew constraint. The Pod stays Pending.
whenUnsatisfiable: DoNotSchedule
ScheduleAnyway (Soft)
The scheduler tries to satisfy the constraint but schedules the Pod on the least-violating node if it cannot achieve the desired skew.
whenUnsatisfiable: ScheduleAnyway
Use ScheduleAnyway for the node-level constraint and DoNotSchedule for the zone-level constraint. This ensures zone spreading is strict while node spreading is best-effort.
matchLabelKeys (Kubernetes 1.27+)
matchLabelKeys scopes the constraint to Pods with the same label values for the specified keys. This is useful during rolling updates to ensure new revision Pods are spread independently of old revision Pods:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
matchLabelKeys:
- pod-template-hash # Only count Pods from the same ReplicaSet
Without this, during a rolling update, the scheduler counts both old and new revision Pods when calculating skew, which can lead to uneven distribution of the new revision.
minDomains
minDomains specifies the minimum number of eligible topology domains. If fewer domains are available, the constraint is treated as if there are minDomains domains:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
minDomains: 3
labelSelector:
matchLabels:
app: web
If only 2 zones have eligible nodes, the scheduler treats it as if there are 3 zones, which can make the constraint unsatisfiable and keep Pods Pending. This prevents concentrating workload in too few zones.
Cluster-Level Defaults
Administrators can set default topology spread constraints for the entire cluster via the scheduler configuration:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
defaultingType: List
This applies a zone-spread constraint to all Pods that do not define their own topologySpreadConstraints.
Debugging Topology Spread
# Check current Pod distribution
kubectl get pods -l app=web -o wide
# Count Pods per zone
kubectl get pods -l app=web -o jsonpath='{range .items[*]}{.spec.nodeName}{"\n"}{end}' | \
while read node; do kubectl get node $node -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}'; echo; done | sort | uniq -c
# Check scheduling events
kubectl describe pod web-abc123 | grep -A 10 Events
# "didn't match pod topology spread constraints"
Comparison with Anti-Affinity
| Feature | Pod Anti-Affinity | Topology Spread | |---|---|---| | Goal | Prevent co-location | Ensure even distribution | | Granularity | Binary (yes/no per domain) | Numeric (maxSkew) | | Multiple replicas per domain | Not with required rules | Yes, controlled by maxSkew | | Performance | Evaluates all matching Pods | More efficient algorithm | | Rolling updates | No revision awareness | matchLabelKeys support |
For most production deployments, topology spread constraints are the better choice for distributing replicas across failure domains.
Why Interviewers Ask This
Interviewers ask this to see if you can design deployments that are resilient to zone and node failures while efficiently utilizing cluster resources.
Common Follow-Up Questions
Key Takeaways
- maxSkew controls the maximum imbalance across topology domains
- DoNotSchedule is a hard constraint; ScheduleAnyway is a soft preference
- Topology spread is more flexible than pod anti-affinity for even distribution