How Does the Horizontal Pod Autoscaler (HPA) Work?
The HPA automatically scales the number of Pod replicas based on observed CPU, memory, or custom metrics. It periodically queries the Metrics API, computes the desired replica count using a target utilization formula, and updates the Deployment or StatefulSet accordingly.
Detailed Answer
How the HPA Works
The Horizontal Pod Autoscaler runs as a control loop in the kube-controller-manager. Every 15 seconds (configurable), it:
- Queries the Metrics API for current metric values.
- Computes the desired replica count based on the target value.
- Updates the scale subresource of the target Deployment or StatefulSet.
The scaling formula is:
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))
For example, if 3 replicas are running with average CPU at 90% and the target is 50%:
desiredReplicas = ceil(3 * (90 / 50)) = ceil(5.4) = 6
Prerequisites
The HPA requires the Metrics Server to be installed:
# Install Metrics Server
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Verify it is running
kubectl get deployment metrics-server -n kube-system
# Test the metrics API
kubectl top nodes
kubectl top pods
Pods must also have resource requests defined, since the HPA calculates utilization as a percentage of the request:
resources:
requests:
cpu: 200m # HPA uses this as the 100% baseline
memory: 256Mi
Basic HPA Configuration
Using kubectl
# Create an HPA for a Deployment
kubectl autoscale deployment web-app \
--cpu-percent=50 \
--min=2 \
--max=20
# Check HPA status
kubectl get hpa
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
# web-app Deployment/web-app 35%/50% 2 20 3 5m
Using YAML (v2 API)
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 70
When multiple metrics are specified, the HPA computes the desired replica count for each metric and uses the highest value.
Custom Metrics
For application-specific scaling (HTTP requests per second, queue depth), use custom metrics via the Prometheus Adapter:
# Install Prometheus Adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus.monitoring.svc:9090
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 3
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
External Metrics
Scale based on metrics not tied to any Pod, such as a cloud message queue depth:
metrics:
- type: External
external:
metric:
name: sqs_queue_messages
selector:
matchLabels:
queue: order-processing
target:
type: AverageValue
averageValue: 10
Scaling Behavior and Stabilization
The behavior field (autoscaling/v2) controls how fast the HPA scales up and down:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Double the replicas
periodSeconds: 60
- type: Pods
value: 5 # Or add 5 Pods
periodSeconds: 60
selectPolicy: Max # Use whichever adds more Pods
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scaling down
policies:
- type: Percent
value: 10 # Remove at most 10% of replicas
periodSeconds: 60
selectPolicy: Min # Use the most conservative policy
Key settings:
- stabilizationWindowSeconds: How long to wait before applying a scale decision. Prevents flapping.
- policies: Define the rate of change (by percentage or absolute count).
- selectPolicy:
Max(most aggressive),Min(most conservative), orDisabled.
Monitoring HPA Decisions
# View HPA details and events
kubectl describe hpa web-app
# Key events to watch:
# "New size: 6; reason: cpu resource utilization above target"
# "New size: 3; reason: All metrics below target"
# View HPA metrics
kubectl get hpa web-app -o yaml
# Check HPA conditions
kubectl get hpa web-app -o jsonpath='{.status.conditions[*].message}'
Common Issues
| Problem | Cause | Fix |
|---|---|---|
| TARGETS shows <unknown>/50% | Metrics Server not installed or Pods lack resource requests | Install Metrics Server and set resource requests |
| HPA never scales up | Target utilization is higher than actual usage | Lower the target percentage or check if requests are too high |
| HPA never scales down | Stabilization window is too long | Reduce scaleDown.stabilizationWindowSeconds |
| Flapping between replica counts | No stabilization window configured | Add behavior.scaleDown.stabilizationWindowSeconds |
HPA with VPA
The Vertical Pod Autoscaler (VPA) adjusts resource requests/limits, while HPA adjusts replica count. They should not both target the same metric (e.g., CPU). A common pattern is to use VPA in recommendation mode to right-size requests and HPA to scale based on custom metrics or CPU utilization.
Best Practices
- Always set resource requests on Pods targeted by HPA.
- Use custom metrics for business-aware scaling (requests/sec, queue depth).
- Configure scale-down stabilization (300s minimum) to prevent flapping.
- Set sensible min/max replica bounds based on capacity planning.
- Monitor HPA events to understand scaling decisions.
Why Interviewers Ask This
Interviewers ask this to verify that you can configure auto-scaling for production workloads and understand how Kubernetes responds to changing load.
Common Follow-Up Questions
Key Takeaways
- HPA requires Metrics Server to be installed for CPU/memory scaling
- Pods must have resource requests defined for CPU-based scaling to work
- Use behavior policies to control scale-up and scale-down rates