How Does StatefulSet Scaling Differ from Deployments?

advanced|statefulsetsdevopssreplatform engineerCKA
TL;DR

StatefulSet scaling is ordered by default — scale-up creates Pods sequentially from lowest to highest ordinal, and scale-down removes them in reverse. PVCs are retained on scale-down, allowing data recovery on scale-up. This contrasts with Deployments, which scale Pods in parallel.

Detailed Answer

Scaling a StatefulSet is fundamentally different from scaling a Deployment. The ordered, identity-preserving nature of StatefulSets means that scaling operations must respect Pod ordering and persistent storage.

Scale-Up Behavior

When you increase replicas, new Pods are created in ascending ordinal order:

# Scale from 3 to 5
kubectl scale statefulset cassandra --replicas=5

The controller creates:

  1. cassandra-3 — waits until Running and Ready
  2. cassandra-4 — waits until Running and Ready

If cassandra-3 had a PVC from a previous scale-down, it is reattached automatically. The Pod resumes with its previous data.

Scale-Down Behavior

When you decrease replicas, Pods are removed in reverse ordinal order:

# Scale from 5 to 3
kubectl scale statefulset cassandra --replicas=3

The controller:

  1. Terminates cassandra-4 — waits until fully stopped
  2. Terminates cassandra-3 — waits until fully stopped
  3. Does not delete PVCs data-cassandra-3 and data-cassandra-4

PVC Retention Policy

Starting with Kubernetes 1.27 (stable in 1.31), StatefulSets support a persistentVolumeClaimRetentionPolicy that controls PVC lifecycle:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: cassandra
spec:
  replicas: 3
  serviceName: "cassandra-headless"
  persistentVolumeClaimRetentionPolicy:
    whenScaled: Delete      # Delete PVCs when scaling down
    whenDeleted: Retain     # Keep PVCs when StatefulSet is deleted
  selector:
    matchLabels:
      app: cassandra
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      containers:
        - name: cassandra
          image: cassandra:4.1
          ports:
            - containerPort: 9042
          resources:
            requests:
              cpu: "500m"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "4Gi"
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: "standard"
        resources:
          requests:
            storage: 100Gi

The policy options are:

| Field | Value | Behavior | |---|---|---| | whenScaled | Retain (default) | PVCs kept on scale-down | | whenScaled | Delete | PVCs deleted on scale-down | | whenDeleted | Retain (default) | PVCs kept when StatefulSet is deleted | | whenDeleted | Delete | PVCs deleted with the StatefulSet |

Scaling with HPA

You can use a HorizontalPodAutoscaler with a StatefulSet, but it requires careful consideration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: cassandra-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: cassandra
  minReplicas: 3
  maxReplicas: 10
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 600  # Wait 10 minutes before scaling down
      policies:
        - type: Pods
          value: 1
          periodSeconds: 300           # Remove at most 1 Pod every 5 minutes
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Key considerations for HPA with StatefulSets:

  • Slow scale-down: Use stabilization windows and conservative policies to prevent rapid scale-down that could affect cluster quorum
  • Data rebalancing: Some applications (Cassandra, Elasticsearch) need time to rebalance data after a member leaves
  • Minimum replicas: Set minReplicas to your quorum size (e.g., 3 for a system that needs majority quorum)

PodDisruptionBudgets for Safe Scaling

Always pair StatefulSets with a PodDisruptionBudget to protect quorum:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: cassandra-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: cassandra

This ensures that at least 2 Pods are always available, preventing scaling or voluntary disruptions from breaking quorum.

Comparison with Deployment Scaling

| Aspect | Deployment | StatefulSet | |---|---|---| | Scale-up speed | Parallel (fast) | Sequential (slower) | | Scale-down speed | Parallel (fast) | Sequential (slower) | | Storage on scale-down | N/A | PVCs retained | | Identity preservation | No | Yes | | HPA compatibility | Full | Requires careful tuning |

Why Interviewers Ask This

This advanced question tests your understanding of how scaling stateful workloads interacts with storage, ordering, and data preservation — critical knowledge for running databases at scale.

Common Follow-Up Questions

What happens to the PVCs when you scale a StatefulSet down from 5 to 3?
PVCs for Pods 3 and 4 are retained. If you scale back to 5, the Pods reattach to their original PVCs with data intact.
Can you scale a StatefulSet with an HPA?
Yes, but you must be careful with stateful workloads. Rapid scaling can overwhelm a database cluster. Consider custom metrics and scale-down stabilization windows.
How does the Parallel pod management policy affect scaling?
With Parallel policy, all new Pods are created simultaneously during scale-up, and all excess Pods are terminated simultaneously during scale-down.

Key Takeaways

  • Scale-up is sequential (0 → N) and scale-down is reverse-sequential (N → 0) by default.
  • PVCs are never automatically deleted during scale-down, preserving data for future scale-up.
  • Use PodDisruptionBudgets to ensure quorum-based applications maintain enough healthy replicas during scaling.

Related Questions

You Might Also Like