How Do Pod Topology Spread Constraints Affect Scheduling?
Pod topology spread constraints control how Pods are distributed across topology domains during scheduling. They interact with other scheduling rules like node affinity and taints, and can be configured as cluster-wide defaults to enforce even distribution without per-Deployment configuration.
Detailed Answer
While the pods topic covers the basics of topology spread constraints, this answer focuses on how they interact with the scheduler, advanced parameters, cluster defaults, and real-world scheduling scenarios.
Scheduling Pipeline Interaction
Topology spread constraints are evaluated during both the Filter and Score phases of scheduling:
1. PreFilter: Calculate existing Pod distribution
2. Filter: Eliminate nodes where maxSkew would be violated (DoNotSchedule)
3. Score: Prefer nodes that minimize skew (ScheduleAnyway)
The constraint works after node affinity and taint filtering. If node affinity limits eligible nodes to zone-a, a zone-level spread constraint has no nodes in other zones to spread to.
Interaction with Node Affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: ["compute"]
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
If "compute" nodes exist in zones a, b, and c, Pods spread across all three. If "compute" nodes exist only in zone-a, the spread constraint effectively does nothing — there is only one domain.
minDomains Parameter
minDomains (beta since 1.25) prevents the constraint from being vacuously satisfied when there are too few topology domains:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
minDomains: 3
labelSelector:
matchLabels:
app: web
Without minDomains, if the cluster has only 1 zone, all Pods land there and maxSkew is trivially satisfied (0 skew). With minDomains: 3, the scheduler treats missing domains as having 0 Pods, potentially making the skew exceed maxSkew and blocking scheduling until 3 zones exist.
matchLabelKeys for Rolling Updates
During a rolling update, the Deployment creates a new ReplicaSet. Old and new Pods have different pod-template-hash labels:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
matchLabelKeys:
- pod-template-hash
matchLabelKeys tells the constraint to only count Pods with the same pod-template-hash as the Pod being scheduled. This means:
- New Pods are spread evenly across zones independently of old Pods
- Old Pods being terminated do not affect new Pod placement
Cluster-Wide Default Constraints
Configure default topology spread constraints in the scheduler config to enforce zone balance across all workloads:
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
- schedulerName: default-scheduler
pluginConfig:
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 3
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
- maxSkew: 5
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
defaultingType: List
These apply to all Pods that do not define their own topology spread constraints. Use ScheduleAnyway for defaults to avoid blocking Pod scheduling unexpectedly.
nodeAffinityPolicy and nodeTaintsPolicy
These fields (GA in 1.26) control how node filtering interacts with topology spread:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: web
nodeAffinityPolicy: Honor # Only count Pods on nodes matching Pod's nodeAffinity
nodeTaintsPolicy: Honor # Only count Pods on nodes the Pod tolerates
| Policy | Behavior |
|--------|----------|
| Honor | Exclude nodes that don't match affinity/taints from skew calculation |
| Ignore | Include all nodes in skew calculation (default) |
Scheduling Performance Impact
Topology spread adds computational cost to scheduling:
- Filter phase: The scheduler must evaluate Pod distribution across all topology domains
- Score phase: The scheduler ranks nodes by how much they improve balance
For large clusters (10,000+ Pods), heavy use of topology spread can slow scheduling. Mitigate by:
- Using
ScheduleAnywayinstead ofDoNotSchedulewhere possible - Limiting constraints to 1-2 topology keys
- Scoping
labelSelectornarrowly
Debugging Topology Spread Scheduling Failures
# Pod stuck pending — check events
kubectl describe pod web-abc -n production
# Events: "2 node(s) didn't match pod topology spread constraints"
# Check current distribution
kubectl get pods -l app=web -o wide --sort-by='.spec.nodeName'
# Check node topology labels
kubectl get nodes --show-labels | grep topology.kubernetes.io/zone
# Verify the constraint configuration
kubectl get deployment web -o jsonpath='{.spec.template.spec.topologySpreadConstraints}' | jq .
Real-World Configuration Example
A production Deployment with comprehensive scheduling constraints:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
replicas: 9
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: api
matchLabelKeys:
- pod-template-hash
- maxSkew: 2
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: api
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: node-type
operator: In
values: ["compute"]
containers:
- name: api
image: api-server:3.0
resources:
requests:
cpu: "500m"
memory: "512Mi"
This ensures:
- 3 Pods per zone (hard constraint, per ReplicaSet)
- At most 2-Pod difference between nodes (soft constraint)
- Only runs on "compute" nodes
Why Interviewers Ask This
This question explores the scheduling implications of topology spread — how it interacts with other constraints, impacts scheduling performance, and can be set as cluster defaults.
Common Follow-Up Questions
Key Takeaways
- Topology spread constraints work after node affinity filtering — they only spread across nodes that pass all other filters.
- Cluster-wide default constraints provide a safety net without requiring per-Deployment configuration.
- Use matchLabelKeys to ensure correct behavior during rolling updates.