What Are Topology Spread Constraints in Kubernetes?

advanced|schedulingdevopssreCKA
TL;DR

Topology spread constraints control how Pods are distributed across topology domains (zones, nodes, racks). Unlike pod anti-affinity which is binary, topology spread uses maxSkew to define how unevenly Pods can be distributed. This enables fine-grained, even workload distribution for high availability.

Detailed Answer

Why Topology Spread Constraints?

Pod anti-affinity can prevent two Pods from sharing the same node or zone, but it cannot control the distribution ratio. If you have 6 replicas across 3 zones, anti-affinity might place 4 in zone-a and 1 each in zone-b and zone-c. Topology spread constraints ensure an even 2-2-2 distribution.

The Constraint Fields

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web

| Field | Purpose | |---|---| | maxSkew | Maximum allowed difference in Pod count between any two domains | | topologyKey | Node label defining the topology domain | | whenUnsatisfiable | DoNotSchedule (hard) or ScheduleAnyway (soft) | | labelSelector | Which Pods to count when calculating the skew | | matchLabelKeys | (1.27+) Use Pod label values to scope the constraint | | minDomains | Minimum number of eligible domains required |

How maxSkew Works

Suppose you have 3 zones and the current Pod distribution for app: web is:

zone-a: 2 Pods
zone-b: 2 Pods
zone-c: 1 Pod

With maxSkew: 1, the next Pod must go to zone-c (bringing it to 2-2-2). Placing it in zone-a or zone-b would create a skew of 2 (3-1=2), which violates maxSkew.

With maxSkew: 2, the Pod can go to any zone because the worst case would be 3-2-1 (skew of 2), which equals the maximum allowed.

Basic Zone Spreading

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: web
      containers:
        - name: web
          image: nginx:1.25

This distributes 6 replicas evenly: 2 per zone across 3 zones.

Multiple Constraints

You can combine constraints to spread across both zones and nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 12
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: api
        - maxSkew: 1
          topologyKey: kubernetes.io/hostname
          whenUnsatisfiable: ScheduleAnyway
          labelSelector:
            matchLabels:
              app: api
      containers:
        - name: api
          image: api-server:v3

The first constraint (hard) ensures even zone distribution. The second constraint (soft) tries to spread evenly across nodes within each zone but allows imbalance if needed.

whenUnsatisfiable Options

DoNotSchedule (Hard)

The Pod will not be scheduled if it would violate the maxSkew constraint. The Pod stays Pending.

whenUnsatisfiable: DoNotSchedule

ScheduleAnyway (Soft)

The scheduler tries to satisfy the constraint but schedules the Pod on the least-violating node if it cannot achieve the desired skew.

whenUnsatisfiable: ScheduleAnyway

Use ScheduleAnyway for the node-level constraint and DoNotSchedule for the zone-level constraint. This ensures zone spreading is strict while node spreading is best-effort.

matchLabelKeys (Kubernetes 1.27+)

matchLabelKeys scopes the constraint to Pods with the same label values for the specified keys. This is useful during rolling updates to ensure new revision Pods are spread independently of old revision Pods:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web
    matchLabelKeys:
      - pod-template-hash  # Only count Pods from the same ReplicaSet

Without this, during a rolling update, the scheduler counts both old and new revision Pods when calculating skew, which can lead to uneven distribution of the new revision.

minDomains

minDomains specifies the minimum number of eligible topology domains. If fewer domains are available, the constraint is treated as if there are minDomains domains:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    minDomains: 3
    labelSelector:
      matchLabels:
        app: web

If only 2 zones have eligible nodes, the scheduler treats it as if there are 3 zones, which can make the constraint unsatisfiable and keep Pods Pending. This prevents concentrating workload in too few zones.

Cluster-Level Defaults

Administrators can set default topology spread constraints for the entire cluster via the scheduler configuration:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

This applies a zone-spread constraint to all Pods that do not define their own topologySpreadConstraints.

Debugging Topology Spread

# Check current Pod distribution
kubectl get pods -l app=web -o wide

# Count Pods per zone
kubectl get pods -l app=web -o jsonpath='{range .items[*]}{.spec.nodeName}{"\n"}{end}' | \
  while read node; do kubectl get node $node -o jsonpath='{.metadata.labels.topology\.kubernetes\.io/zone}'; echo; done | sort | uniq -c

# Check scheduling events
kubectl describe pod web-abc123 | grep -A 10 Events
# "didn't match pod topology spread constraints"

Comparison with Anti-Affinity

| Feature | Pod Anti-Affinity | Topology Spread | |---|---|---| | Goal | Prevent co-location | Ensure even distribution | | Granularity | Binary (yes/no per domain) | Numeric (maxSkew) | | Multiple replicas per domain | Not with required rules | Yes, controlled by maxSkew | | Performance | Evaluates all matching Pods | More efficient algorithm | | Rolling updates | No revision awareness | matchLabelKeys support |

For most production deployments, topology spread constraints are the better choice for distributing replicas across failure domains.

Why Interviewers Ask This

Interviewers ask this to see if you can design deployments that are resilient to zone and node failures while efficiently utilizing cluster resources.

Common Follow-Up Questions

What is maxSkew?
maxSkew defines the maximum allowed difference in Pod count between any two topology domains. A maxSkew of 1 enforces the most even distribution.
How do topology spread constraints differ from pod anti-affinity?
Anti-affinity is all-or-nothing per domain. Topology spread aims for an even count across domains with a configurable tolerance (maxSkew).
What does whenUnsatisfiable: ScheduleAnyway mean?
It makes the constraint a soft preference. The scheduler tries to satisfy it but schedules the Pod elsewhere if it cannot.

Key Takeaways

  • maxSkew controls the maximum imbalance across topology domains
  • DoNotSchedule is a hard constraint; ScheduleAnyway is a soft preference
  • Topology spread is more flexible than pod anti-affinity for even distribution