kubectl autoscale
Automatically scale the number of pods in a deployment, ReplicaSet, or StatefulSet based on observed CPU utilization or other metrics.
kubectl autoscale [TYPE] [NAME] --min=MIN --max=MAX [--cpu-percent=TARGET] [flags]Common Flags
| Flag | Short | Description |
|---|---|---|
| --min | — | The minimum number of replicas (lower bound for the autoscaler) |
| --max | — | The maximum number of replicas (upper bound, required) |
| --cpu-percent | — | The target average CPU utilization percentage across all pods |
| --name | — | The name for the autoscaler resource |
| --namespace | -n | Namespace for the autoscaler |
| --dry-run | — | Must be none, server, or client. Preview without creating |
Examples
Autoscale a deployment between 2 and 10 replicas at 80% CPU
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80Autoscale with only a max specified
kubectl autoscale deployment/my-app --max=5Generate HPA YAML without creating
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=70 --dry-run=client -o yamlAutoscale a StatefulSet
kubectl autoscale statefulset/web --min=3 --max=10 --cpu-percent=75View existing autoscalers
kubectl get hpaDescribe an autoscaler for debugging
kubectl describe hpa my-appWhen to Use kubectl autoscale
kubectl autoscale creates a Horizontal Pod Autoscaler (HPA) that automatically adjusts the replica count of a workload based on observed metrics. It is the imperative way to set up auto-scaling for deployments that experience variable traffic patterns.
Creating an Autoscaler
The basic command requires a target deployment and a maximum replica count:
# Scale between 2 and 10 replicas targeting 80% CPU utilization
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80
# View the created HPA
kubectl get hpa my-app
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# my-app Deployment/my-app 45%/80% 2 10 3 30s
The TARGETS column shows current utilization versus the target. The HPA adjusts replicas to keep the current value close to the target.
Prerequisites
For autoscaling to work, two things must be in place:
- Metrics Server must be running in the cluster:
# Verify Metrics Server is available
kubectl top pods
# If this works, Metrics Server is running
- Resource requests must be set on the target containers:
# Check if resource requests are defined
kubectl get deployment my-app -o jsonpath='{.spec.template.spec.containers[0].resources.requests}'
# {"cpu":"250m","memory":"128Mi"}
# If not set, the HPA cannot calculate utilization
kubectl set resources deployment/my-app -c app --requests=cpu=250m,memory=128Mi
Understanding the Algorithm
The HPA calculates the desired replica count using this formula:
desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))
For example:
- Current replicas: 3
- Current average CPU: 90%
- Target CPU: 50%
- Desired: ceil(3 * 90/50) = ceil(5.4) = 6
# Watch the HPA make scaling decisions
kubectl get hpa my-app -w
# See detailed scaling events
kubectl describe hpa my-app
Scaling Behavior and Cooldowns
The HPA has built-in stabilization to prevent thrashing:
- Scale up: Responds within 15-30 seconds when metrics exceed the target.
- Scale down: Waits 5 minutes (default stabilization window) after the last scale event before scaling down.
- Tolerance: The HPA has a 10% tolerance band. It does not scale if the ratio is within 0.9-1.1 of the target.
# View the scaling behavior in the HPA spec
kubectl get hpa my-app -o yaml | grep -A 20 behavior
Advanced HPA with YAML
For more complex scaling rules including memory-based scaling and custom metrics, define the HPA in YAML:
# Generate a starter YAML
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80 \
--dry-run=client -o yaml > hpa.yaml
Then edit the YAML to add memory metrics, custom metrics, or scaling policies:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Managing Autoscalers
# List all HPAs
kubectl get hpa -A
# Check HPA status and conditions
kubectl describe hpa my-app
# Update the HPA
kubectl patch hpa my-app -p '{"spec":{"maxReplicas":20}}'
# Delete the HPA (stops autoscaling, current replica count remains)
kubectl delete hpa my-app
Troubleshooting
Common issues with autoscaling:
# HPA shows <unknown> for targets
kubectl describe hpa my-app
# Check: Metrics Server running? Resource requests set?
# HPA not scaling up
# Check the current metrics
kubectl top pods -l app=my-app
# Verify the target percentage is being exceeded
# HPA not scaling down
# Check the stabilization window and recent scaling events
kubectl describe hpa my-app | grep -A 10 "Events"
Interaction with kubectl scale
Manual scaling and HPA conflict. If an HPA exists, it overrides manual scale commands:
# This will be overridden by the HPA within seconds
kubectl scale deployment/my-app --replicas=1
# Instead, adjust the HPA's bounds
kubectl patch hpa my-app -p '{"spec":{"minReplicas":1}}'
Best Practices
Always set resource requests on pods managed by an HPA. Start with conservative min/max values and adjust based on observed behavior. Use the autoscaling/v2 API for memory and custom metrics. Set appropriate stabilization windows to prevent flapping. Monitor HPA decisions in production to verify the scaling behavior matches your expectations. Consider using Vertical Pod Autoscaler (VPA) alongside HPA for right-sizing resource requests.
Interview Questions About This Command
Common Mistakes
- Not setting CPU resource requests on pods, which prevents the HPA from calculating utilization and causes the autoscaler to fail.
- Setting the minimum replicas too high, wasting resources during low-traffic periods, or too low, causing slow scale-up response.
- Not understanding the cooldown periods — the HPA waits before scaling down (default 5 minutes) to prevent flapping.