kubectl autoscale

Automatically scale the number of pods in a deployment, ReplicaSet, or StatefulSet based on observed CPU utilization or other metrics.

kubectl autoscale [TYPE] [NAME] --min=MIN --max=MAX [--cpu-percent=TARGET] [flags]

Common Flags

FlagShortDescription
--minThe minimum number of replicas (lower bound for the autoscaler)
--maxThe maximum number of replicas (upper bound, required)
--cpu-percentThe target average CPU utilization percentage across all pods
--nameThe name for the autoscaler resource
--namespace-nNamespace for the autoscaler
--dry-runMust be none, server, or client. Preview without creating

Examples

Autoscale a deployment between 2 and 10 replicas at 80% CPU

kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80

Autoscale with only a max specified

kubectl autoscale deployment/my-app --max=5

Generate HPA YAML without creating

kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=70 --dry-run=client -o yaml

Autoscale a StatefulSet

kubectl autoscale statefulset/web --min=3 --max=10 --cpu-percent=75

View existing autoscalers

kubectl get hpa

Describe an autoscaler for debugging

kubectl describe hpa my-app

When to Use kubectl autoscale

kubectl autoscale creates a Horizontal Pod Autoscaler (HPA) that automatically adjusts the replica count of a workload based on observed metrics. It is the imperative way to set up auto-scaling for deployments that experience variable traffic patterns.

Creating an Autoscaler

The basic command requires a target deployment and a maximum replica count:

# Scale between 2 and 10 replicas targeting 80% CPU utilization
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80

# View the created HPA
kubectl get hpa my-app
# NAME     REFERENCE           TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
# my-app   Deployment/my-app   45%/80%   2         10        3          30s

The TARGETS column shows current utilization versus the target. The HPA adjusts replicas to keep the current value close to the target.

Prerequisites

For autoscaling to work, two things must be in place:

  1. Metrics Server must be running in the cluster:
# Verify Metrics Server is available
kubectl top pods
# If this works, Metrics Server is running
  1. Resource requests must be set on the target containers:
# Check if resource requests are defined
kubectl get deployment my-app -o jsonpath='{.spec.template.spec.containers[0].resources.requests}'
# {"cpu":"250m","memory":"128Mi"}

# If not set, the HPA cannot calculate utilization
kubectl set resources deployment/my-app -c app --requests=cpu=250m,memory=128Mi

Understanding the Algorithm

The HPA calculates the desired replica count using this formula:

desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue))

For example:

  • Current replicas: 3
  • Current average CPU: 90%
  • Target CPU: 50%
  • Desired: ceil(3 * 90/50) = ceil(5.4) = 6
# Watch the HPA make scaling decisions
kubectl get hpa my-app -w

# See detailed scaling events
kubectl describe hpa my-app

Scaling Behavior and Cooldowns

The HPA has built-in stabilization to prevent thrashing:

  • Scale up: Responds within 15-30 seconds when metrics exceed the target.
  • Scale down: Waits 5 minutes (default stabilization window) after the last scale event before scaling down.
  • Tolerance: The HPA has a 10% tolerance band. It does not scale if the ratio is within 0.9-1.1 of the target.
# View the scaling behavior in the HPA spec
kubectl get hpa my-app -o yaml | grep -A 20 behavior

Advanced HPA with YAML

For more complex scaling rules including memory-based scaling and custom metrics, define the HPA in YAML:

# Generate a starter YAML
kubectl autoscale deployment/my-app --min=2 --max=10 --cpu-percent=80 \
  --dry-run=client -o yaml > hpa.yaml

Then edit the YAML to add memory metrics, custom metrics, or scaling policies:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-app
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Managing Autoscalers

# List all HPAs
kubectl get hpa -A

# Check HPA status and conditions
kubectl describe hpa my-app

# Update the HPA
kubectl patch hpa my-app -p '{"spec":{"maxReplicas":20}}'

# Delete the HPA (stops autoscaling, current replica count remains)
kubectl delete hpa my-app

Troubleshooting

Common issues with autoscaling:

# HPA shows <unknown> for targets
kubectl describe hpa my-app
# Check: Metrics Server running? Resource requests set?

# HPA not scaling up
# Check the current metrics
kubectl top pods -l app=my-app
# Verify the target percentage is being exceeded

# HPA not scaling down
# Check the stabilization window and recent scaling events
kubectl describe hpa my-app | grep -A 10 "Events"

Interaction with kubectl scale

Manual scaling and HPA conflict. If an HPA exists, it overrides manual scale commands:

# This will be overridden by the HPA within seconds
kubectl scale deployment/my-app --replicas=1

# Instead, adjust the HPA's bounds
kubectl patch hpa my-app -p '{"spec":{"minReplicas":1}}'

Best Practices

Always set resource requests on pods managed by an HPA. Start with conservative min/max values and adjust based on observed behavior. Use the autoscaling/v2 API for memory and custom metrics. Set appropriate stabilization windows to prevent flapping. Monitor HPA decisions in production to verify the scaling behavior matches your expectations. Consider using Vertical Pod Autoscaler (VPA) alongside HPA for right-sizing resource requests.

Interview Questions About This Command

What are the prerequisites for kubectl autoscale to work?
The Metrics Server must be installed to provide CPU and memory metrics. The target pods must have CPU resource requests defined, because the HPA calculates utilization as a percentage of the requested CPU.
How does the Horizontal Pod Autoscaler calculate the desired replica count?
It uses the formula: desiredReplicas = ceil(currentReplicas * (currentMetricValue / targetMetricValue)). For example, if target CPU is 50% and current average is 80% with 3 replicas, desired = ceil(3 * 80/50) = 5.
What happens if pods do not have resource requests set?
The HPA cannot calculate CPU utilization percentage because there is no baseline. The autoscaler will report an error and will not scale. Always set resource requests on pods managed by an HPA.

Common Mistakes

  • Not setting CPU resource requests on pods, which prevents the HPA from calculating utilization and causes the autoscaler to fail.
  • Setting the minimum replicas too high, wasting resources during low-traffic periods, or too low, causing slow scale-up response.
  • Not understanding the cooldown periods — the HPA waits before scaling down (default 5 minutes) to prevent flapping.

Related Commands