How Does StatefulSet Scaling Differ from Deployments?
StatefulSet scaling is ordered by default — scale-up creates Pods sequentially from lowest to highest ordinal, and scale-down removes them in reverse. PVCs are retained on scale-down, allowing data recovery on scale-up. This contrasts with Deployments, which scale Pods in parallel.
Detailed Answer
Scaling a StatefulSet is fundamentally different from scaling a Deployment. The ordered, identity-preserving nature of StatefulSets means that scaling operations must respect Pod ordering and persistent storage.
Scale-Up Behavior
When you increase replicas, new Pods are created in ascending ordinal order:
# Scale from 3 to 5
kubectl scale statefulset cassandra --replicas=5
The controller creates:
cassandra-3— waits until Running and Readycassandra-4— waits until Running and Ready
If cassandra-3 had a PVC from a previous scale-down, it is reattached automatically. The Pod resumes with its previous data.
Scale-Down Behavior
When you decrease replicas, Pods are removed in reverse ordinal order:
# Scale from 5 to 3
kubectl scale statefulset cassandra --replicas=3
The controller:
- Terminates
cassandra-4— waits until fully stopped - Terminates
cassandra-3— waits until fully stopped - Does not delete PVCs
data-cassandra-3anddata-cassandra-4
PVC Retention Policy
Starting with Kubernetes 1.27 (stable in 1.31), StatefulSets support a persistentVolumeClaimRetentionPolicy that controls PVC lifecycle:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: cassandra
spec:
replicas: 3
serviceName: "cassandra-headless"
persistentVolumeClaimRetentionPolicy:
whenScaled: Delete # Delete PVCs when scaling down
whenDeleted: Retain # Keep PVCs when StatefulSet is deleted
selector:
matchLabels:
app: cassandra
template:
metadata:
labels:
app: cassandra
spec:
containers:
- name: cassandra
image: cassandra:4.1
ports:
- containerPort: 9042
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "2"
memory: "4Gi"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 100Gi
The policy options are:
| Field | Value | Behavior | |---|---|---| | whenScaled | Retain (default) | PVCs kept on scale-down | | whenScaled | Delete | PVCs deleted on scale-down | | whenDeleted | Retain (default) | PVCs kept when StatefulSet is deleted | | whenDeleted | Delete | PVCs deleted with the StatefulSet |
Scaling with HPA
You can use a HorizontalPodAutoscaler with a StatefulSet, but it requires careful consideration:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: cassandra-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: StatefulSet
name: cassandra
minReplicas: 3
maxReplicas: 10
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # Wait 10 minutes before scaling down
policies:
- type: Pods
value: 1
periodSeconds: 300 # Remove at most 1 Pod every 5 minutes
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Key considerations for HPA with StatefulSets:
- Slow scale-down: Use stabilization windows and conservative policies to prevent rapid scale-down that could affect cluster quorum
- Data rebalancing: Some applications (Cassandra, Elasticsearch) need time to rebalance data after a member leaves
- Minimum replicas: Set
minReplicasto your quorum size (e.g., 3 for a system that needs majority quorum)
PodDisruptionBudgets for Safe Scaling
Always pair StatefulSets with a PodDisruptionBudget to protect quorum:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: cassandra-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: cassandra
This ensures that at least 2 Pods are always available, preventing scaling or voluntary disruptions from breaking quorum.
Comparison with Deployment Scaling
| Aspect | Deployment | StatefulSet | |---|---|---| | Scale-up speed | Parallel (fast) | Sequential (slower) | | Scale-down speed | Parallel (fast) | Sequential (slower) | | Storage on scale-down | N/A | PVCs retained | | Identity preservation | No | Yes | | HPA compatibility | Full | Requires careful tuning |
Why Interviewers Ask This
This advanced question tests your understanding of how scaling stateful workloads interacts with storage, ordering, and data preservation — critical knowledge for running databases at scale.
Common Follow-Up Questions
Key Takeaways
- Scale-up is sequential (0 → N) and scale-down is reverse-sequential (N → 0) by default.
- PVCs are never automatically deleted during scale-down, preserving data for future scale-up.
- Use PodDisruptionBudgets to ensure quorum-based applications maintain enough healthy replicas during scaling.