kubectl scale
Set a new size for a deployment, ReplicaSet, or StatefulSet by updating the replica count.
kubectl scale [TYPE] [NAME] --replicas=COUNT [flags]Common Flags
| Flag | Short | Description |
|---|---|---|
| --replicas | — | The new desired number of replicas (required) |
| --current-replicas | — | Precondition: only scale if the current replica count matches this value |
| --resource-version | — | Precondition: only scale if the resource version matches this value |
| --timeout | — | The length of time to wait before giving up on the scale operation |
| --selector | -l | Label selector to filter resources to scale |
| --all | — | Select all resources of the specified type |
Examples
Scale a deployment to 5 replicas
kubectl scale deployment/my-app --replicas=5Scale only if current replicas is 3
kubectl scale deployment/my-app --replicas=5 --current-replicas=3Scale to zero (stop all pods)
kubectl scale deployment/my-app --replicas=0Scale multiple deployments
kubectl scale deployment/app1 deployment/app2 --replicas=3Scale all deployments with a label
kubectl scale deployment -l tier=frontend --replicas=3Scale a StatefulSet
kubectl scale statefulset/postgresql --replicas=3Scale from a file
kubectl scale -f deployment.yaml --replicas=5When to Use kubectl scale
kubectl scale adjusts the number of pod replicas for a workload. It is the fastest way to manually scale up for increased traffic, scale down to save resources, or scale to zero for maintenance. For automatic scaling based on metrics, use kubectl autoscale or define an HPA.
Scaling Up and Down
# Scale up for increased load
kubectl scale deployment/my-app --replicas=10
# Scale back down after the peak
kubectl scale deployment/my-app --replicas=3
# Verify the current replica count
kubectl get deployment my-app
Scaling up creates new pods immediately. They go through the normal pod lifecycle: Pending, ContainerCreating, Running. Scaling down terminates excess pods gracefully, respecting the terminationGracePeriodSeconds.
Scaling to Zero
Scaling to zero is useful for maintenance, cost savings, or temporarily disabling a service:
# Stop all pods (but keep the deployment)
kubectl scale deployment/my-app --replicas=0
# Verify no pods are running
kubectl get pods -l app=my-app
# No resources found
# Scale back up when ready
kubectl scale deployment/my-app --replicas=3
The deployment, ReplicaSet, service, and other resources remain intact. Only the pods are removed.
Conditional Scaling
The --current-replicas flag provides a precondition to prevent race conditions:
# Only scale if currently at 3 replicas
kubectl scale deployment/my-app --replicas=5 --current-replicas=3
# This prevents conflicts when multiple people or scripts scale simultaneously
# If the current count is not 3, the command fails with a precondition error
This is particularly useful in automated scripts where concurrent scaling operations might occur.
Scaling Multiple Resources
Scale several deployments at once:
# Scale multiple named deployments
kubectl scale deployment/app1 deployment/app2 deployment/app3 --replicas=2
# Scale all deployments matching a label
kubectl scale deployment -l tier=worker --replicas=5
# Scale all deployments in a namespace
kubectl scale deployment --all --replicas=2 -n staging
Scaling StatefulSets
StatefulSet scaling has ordering guarantees:
# Scale up — pods created in order: 0, 1, 2
kubectl scale statefulset/postgresql --replicas=3
# Watch the ordered creation
kubectl get pods -l app=postgresql -w
# postgresql-0 Running
# postgresql-1 Running
# postgresql-2 Running (new)
# Scale down — pods removed in reverse order: 2, 1
kubectl scale statefulset/postgresql --replicas=1
# postgresql-2 terminated first, then postgresql-1
Each StatefulSet pod has a stable network identity and may have a PersistentVolumeClaim. Scaling down does not delete PVCs, so data is preserved for scale-up.
Interaction with HPA
If an HPA (Horizontal Pod Autoscaler) manages the deployment, manual scaling is overridden:
# Check if an HPA exists
kubectl get hpa -l app=my-app
# If HPA exists, it will override your manual scale within seconds
# To manually scale, first remove the HPA
kubectl delete hpa my-app-hpa
# Then scale
kubectl scale deployment/my-app --replicas=5
# Or adjust the HPA's min/max instead
kubectl patch hpa my-app-hpa -p '{"spec":{"minReplicas":5}}'
Emergency Scaling
During incidents, quick scaling can be critical:
# Emergency scale up during traffic spike
kubectl scale deployment/my-app --replicas=20
# Scale down a misbehaving deployment
kubectl scale deployment/buggy-app --replicas=0
# Scale up the previous version after rollback
kubectl rollout undo deployment/my-app
kubectl scale deployment/my-app --replicas=10
Monitoring Scale Events
After scaling, verify the operation completed:
# Check the deployment
kubectl get deployment my-app
# Watch pods come up
kubectl get pods -l app=my-app -w
# Check events for scheduling issues
kubectl get events --field-selector reason=FailedScheduling --sort-by=.lastTimestamp
If pods stay in Pending state after scaling up, the cluster may lack resources. Check node capacity with kubectl top nodes and consider adding nodes to the cluster.
Best Practices
Use HPA for automatic scaling based on CPU, memory, or custom metrics rather than manual scaling. Use --current-replicas in scripts for safe concurrent scaling. When scaling to zero, ensure downstream services handle the unavailability gracefully. Monitor cluster capacity to ensure scale-up requests can be fulfilled. For StatefulSets, understand the ordering implications before scaling.
Interview Questions About This Command
Common Mistakes
- Manually scaling a deployment that has an HPA, which will immediately override the new replica count.
- Scaling to zero in production without understanding the impact — all traffic to the service will fail.
- Not using --current-replicas for conditional scaling in scripts, which can cause race conditions with concurrent scaling.