How Does Pod Topology Spread Work in Kubernetes?

advanced|podsdevopssreplatform engineerCKA
TL;DR

Pod topology spread constraints distribute Pods evenly across topology domains (nodes, zones, regions) based on a maxSkew value, giving you finer control over Pod distribution than anti-affinity alone.

Detailed Answer

Topology spread constraints were introduced in Kubernetes 1.19 as a general-purpose mechanism to distribute Pods evenly across configurable topology domains. They solve a problem that pod anti-affinity handles only partially: ensuring balanced distribution rather than just separation.

The Problem with Anti-Affinity Alone

Pod anti-affinity with topologyKey: topology.kubernetes.io/zone prevents two Pods from landing in the same zone, but it does not balance them. With 6 replicas and 3 zones, anti-affinity alone could place 4 Pods in zone-a and 1 each in zone-b and zone-c. Topology spread constraints enforce a maximum imbalance.

Basic Topology Spread Constraint

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      topologySpreadConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: DoNotSchedule
          labelSelector:
            matchLabels:
              app: web
      containers:
        - name: nginx
          image: nginx:1.27
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"

This ensures that across all zones, the difference in web Pod count is at most 1. With 3 zones and 6 replicas, each zone gets exactly 2 Pods.

Key Fields Explained

| Field | Purpose | |-------|---------| | maxSkew | Maximum difference in Pod count between any two topology domains | | topologyKey | Node label that defines topology domains (zone, hostname, region) | | whenUnsatisfiable | DoNotSchedule (hard) or ScheduleAnyway (soft) | | labelSelector | Selects which Pods to count for skew calculation | | matchLabelKeys | (v1.27+) Uses Pod label values to scope the constraint per rollout | | minDomains | (v1.25+) Minimum number of domains required before enforcing the constraint |

Multiple Constraints

You can apply multiple topology spread constraints simultaneously — for example, spread across both zones and nodes:

spec:
  topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          app: web
    - maxSkew: 1
      topologyKey: kubernetes.io/hostname
      whenUnsatisfiable: ScheduleAnyway
      labelSelector:
        matchLabels:
          app: web

The first constraint is hard (zone-level balance is mandatory), while the second is soft (node-level balance is best-effort).

Cluster-Level Defaults

Starting in Kubernetes 1.24, you can configure default topology spread constraints via the kube-scheduler --config file:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - pluginConfig:
      - name: PodTopologySpread
        args:
          defaultConstraints:
            - maxSkew: 1
              topologyKey: topology.kubernetes.io/zone
              whenUnsatisfiable: ScheduleAnyway
          defaultingType: List

This applies to all Pods that do not define their own constraints, providing a safety net across the cluster.

matchLabelKeys for Rolling Updates

During a rolling update, old and new ReplicaSets co-exist. Without matchLabelKeys, the constraint counts both old and new Pods together, which can cause imbalance. Setting matchLabelKeys: ["pod-template-hash"] scopes counting to only the Pods from the same ReplicaSet:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web
    matchLabelKeys:
      - pod-template-hash

Debugging Topology Spread

When Pods are stuck in Pending, check scheduler events:

kubectl describe pod <pod-name> | grep -A 10 Events
# Look for: "doesn't satisfy spread constraint"

# Check current distribution
kubectl get pods -l app=web -o wide --sort-by='.spec.nodeName'

When to Use Topology Spread vs. Anti-Affinity

| Scenario | Best Tool | |----------|-----------| | No two replicas on the same node | Pod anti-affinity | | Even distribution across zones | Topology spread constraints | | Co-locate with specific Pods | Pod affinity | | Combination of balance and separation | Both constraints together |

Why Interviewers Ask This

This question tests your ability to design highly available workloads that remain balanced across failure domains, which is critical for production systems.

Common Follow-Up Questions

What does maxSkew control?
maxSkew defines the maximum allowed difference in Pod count between any two topology domains. A maxSkew of 1 means domains can differ by at most one Pod.
What happens when whenUnsatisfiable is set to DoNotSchedule vs ScheduleAnyway?
DoNotSchedule prevents placement if the constraint cannot be met. ScheduleAnyway places the Pod but the scheduler still tries to minimize skew.
Can you combine topology spread constraints with node affinity?
Yes — node affinity narrows the candidate nodes first, then topology spread distributes Pods evenly within that filtered set.

Key Takeaways

  • Topology spread constraints provide more granular distribution control than pod anti-affinity.
  • The maxSkew parameter defines the maximum imbalance allowed between topology domains.
  • Combine topology spread with node affinity and pod anti-affinity for sophisticated scheduling strategies.

Related Questions

You Might Also Like