kubectl scale

Set a new size for a deployment, ReplicaSet, or StatefulSet by updating the replica count.

kubectl scale [TYPE] [NAME] --replicas=COUNT [flags]

Common Flags

FlagShortDescription
--replicasThe new desired number of replicas (required)
--current-replicasPrecondition: only scale if the current replica count matches this value
--resource-versionPrecondition: only scale if the resource version matches this value
--timeoutThe length of time to wait before giving up on the scale operation
--selector-lLabel selector to filter resources to scale
--allSelect all resources of the specified type

Examples

Scale a deployment to 5 replicas

kubectl scale deployment/my-app --replicas=5

Scale only if current replicas is 3

kubectl scale deployment/my-app --replicas=5 --current-replicas=3

Scale to zero (stop all pods)

kubectl scale deployment/my-app --replicas=0

Scale multiple deployments

kubectl scale deployment/app1 deployment/app2 --replicas=3

Scale all deployments with a label

kubectl scale deployment -l tier=frontend --replicas=3

Scale a StatefulSet

kubectl scale statefulset/postgresql --replicas=3

Scale from a file

kubectl scale -f deployment.yaml --replicas=5

When to Use kubectl scale

kubectl scale adjusts the number of pod replicas for a workload. It is the fastest way to manually scale up for increased traffic, scale down to save resources, or scale to zero for maintenance. For automatic scaling based on metrics, use kubectl autoscale or define an HPA.

Scaling Up and Down

# Scale up for increased load
kubectl scale deployment/my-app --replicas=10

# Scale back down after the peak
kubectl scale deployment/my-app --replicas=3

# Verify the current replica count
kubectl get deployment my-app

Scaling up creates new pods immediately. They go through the normal pod lifecycle: Pending, ContainerCreating, Running. Scaling down terminates excess pods gracefully, respecting the terminationGracePeriodSeconds.

Scaling to Zero

Scaling to zero is useful for maintenance, cost savings, or temporarily disabling a service:

# Stop all pods (but keep the deployment)
kubectl scale deployment/my-app --replicas=0

# Verify no pods are running
kubectl get pods -l app=my-app
# No resources found

# Scale back up when ready
kubectl scale deployment/my-app --replicas=3

The deployment, ReplicaSet, service, and other resources remain intact. Only the pods are removed.

Conditional Scaling

The --current-replicas flag provides a precondition to prevent race conditions:

# Only scale if currently at 3 replicas
kubectl scale deployment/my-app --replicas=5 --current-replicas=3

# This prevents conflicts when multiple people or scripts scale simultaneously
# If the current count is not 3, the command fails with a precondition error

This is particularly useful in automated scripts where concurrent scaling operations might occur.

Scaling Multiple Resources

Scale several deployments at once:

# Scale multiple named deployments
kubectl scale deployment/app1 deployment/app2 deployment/app3 --replicas=2

# Scale all deployments matching a label
kubectl scale deployment -l tier=worker --replicas=5

# Scale all deployments in a namespace
kubectl scale deployment --all --replicas=2 -n staging

Scaling StatefulSets

StatefulSet scaling has ordering guarantees:

# Scale up — pods created in order: 0, 1, 2
kubectl scale statefulset/postgresql --replicas=3

# Watch the ordered creation
kubectl get pods -l app=postgresql -w
# postgresql-0   Running
# postgresql-1   Running
# postgresql-2   Running (new)

# Scale down — pods removed in reverse order: 2, 1
kubectl scale statefulset/postgresql --replicas=1
# postgresql-2 terminated first, then postgresql-1

Each StatefulSet pod has a stable network identity and may have a PersistentVolumeClaim. Scaling down does not delete PVCs, so data is preserved for scale-up.

Interaction with HPA

If an HPA (Horizontal Pod Autoscaler) manages the deployment, manual scaling is overridden:

# Check if an HPA exists
kubectl get hpa -l app=my-app

# If HPA exists, it will override your manual scale within seconds
# To manually scale, first remove the HPA
kubectl delete hpa my-app-hpa

# Then scale
kubectl scale deployment/my-app --replicas=5

# Or adjust the HPA's min/max instead
kubectl patch hpa my-app-hpa -p '{"spec":{"minReplicas":5}}'

Emergency Scaling

During incidents, quick scaling can be critical:

# Emergency scale up during traffic spike
kubectl scale deployment/my-app --replicas=20

# Scale down a misbehaving deployment
kubectl scale deployment/buggy-app --replicas=0

# Scale up the previous version after rollback
kubectl rollout undo deployment/my-app
kubectl scale deployment/my-app --replicas=10

Monitoring Scale Events

After scaling, verify the operation completed:

# Check the deployment
kubectl get deployment my-app

# Watch pods come up
kubectl get pods -l app=my-app -w

# Check events for scheduling issues
kubectl get events --field-selector reason=FailedScheduling --sort-by=.lastTimestamp

If pods stay in Pending state after scaling up, the cluster may lack resources. Check node capacity with kubectl top nodes and consider adding nodes to the cluster.

Best Practices

Use HPA for automatic scaling based on CPU, memory, or custom metrics rather than manual scaling. Use --current-replicas in scripts for safe concurrent scaling. When scaling to zero, ensure downstream services handle the unavailability gracefully. Monitor cluster capacity to ensure scale-up requests can be fulfilled. For StatefulSets, understand the ordering implications before scaling.

Interview Questions About This Command

How do you scale a deployment to zero pods?
Use kubectl scale deployment/<name> --replicas=0. This terminates all pods but keeps the Deployment, ReplicaSet, and other configuration intact for easy scale-up later.
What happens when you scale a deployment that has an HPA attached?
The HPA will override your manual scale setting. The HPA continuously adjusts the replica count based on metrics, so manual scaling is ineffective. Disable or delete the HPA first for manual control.
How does scaling a StatefulSet differ from scaling a Deployment?
StatefulSets scale up by creating pods in order (0, 1, 2...) and scale down in reverse order. Each pod has a stable identity and persistent storage. Deployments scale pods without ordering guarantees.

Common Mistakes

  • Manually scaling a deployment that has an HPA, which will immediately override the new replica count.
  • Scaling to zero in production without understanding the impact — all traffic to the service will fail.
  • Not using --current-replicas for conditional scaling in scripts, which can cause race conditions with concurrent scaling.

Related Commands