How Does Custom Metrics Autoscaling Work in Kubernetes?
Custom metrics autoscaling configures the HPA to scale based on application-specific metrics like requests per second, queue depth, or business metrics instead of just CPU and memory. It requires a metrics adapter that exposes custom metrics through the Kubernetes metrics API.
Detailed Answer
The default HPA scales on CPU and memory utilization, but these metrics often poorly represent actual application load. Custom metrics autoscaling lets you scale on application-specific signals like request rate, response latency, queue depth, or active connections.
Kubernetes Metrics APIs
The HPA queries three different metrics APIs:
| API | Source | Use Case |
|-----|--------|----------|
| metrics.k8s.io | metrics-server | CPU and memory utilization |
| custom.metrics.k8s.io | Metrics adapter | Per-Pod or per-object application metrics |
| external.metrics.k8s.io | Metrics adapter | External system metrics (cloud queues, etc.) |
Architecture
Application → Prometheus → Prometheus Adapter → Kubernetes Metrics API → HPA
(exposes (scrapes (translates to (standard (scales
/metrics) metrics) k8s format) metrics API) replicas)
Setting Up Prometheus Adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
--namespace monitoring \
--set prometheus.url=http://prometheus.monitoring:9090
Configure the adapter to expose specific metrics:
# prometheus-adapter ConfigMap
rules:
# Expose http_requests_per_second as a Pod metric
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)_total$"
as: "${1}_per_second"
metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
# Expose active_connections as a Pod metric
- seriesQuery: 'active_connections{namespace!="",pod!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "active_connections"
metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'
Verify the custom metric is available:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second"
HPA with Custom Metrics
Scale on Requests Per Second
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 50
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100" # Target: 100 RPS per Pod
When average RPS across Pods exceeds 100, the HPA adds replicas. When it drops below, replicas are removed.
Scale on Multiple Metrics
spec:
metrics:
# CPU (keep as a safety net)
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
# Request rate (primary scaling signal)
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
# Response latency (scale up when slow)
- type: Pods
pods:
metric:
name: http_request_duration_p99_milliseconds
target:
type: AverageValue
averageValue: "500" # Target: p99 < 500ms
When multiple metrics are defined, the HPA calculates the desired replica count for each metric and uses the highest value.
External Metrics
External metrics are not tied to Kubernetes objects. They are useful for scaling based on cloud service metrics:
spec:
metrics:
- type: External
external:
metric:
name: sqs_queue_depth
selector:
matchLabels:
queue: orders
target:
type: AverageValue
averageValue: "20" # Target: 20 messages per Pod
This requires the metrics adapter to expose SQS queue depth as an external metric.
Object Metrics
Object metrics reference a specific Kubernetes object:
spec:
metrics:
- type: Object
object:
describedObject:
apiVersion: networking.k8s.io/v1
kind: Ingress
name: api-ingress
metric:
name: requests_per_second
target:
type: Value
value: "2000" # Scale when Ingress receives > 2000 RPS
Scaling Behavior Configuration
Fine-tune scaling speed to prevent flapping:
spec:
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100 # Can double replicas per period
periodSeconds: 60
- type: Pods
value: 5 # Or add 5 Pods per period
periodSeconds: 60
selectPolicy: Max # Use whichever adds more replicas
scaleDown:
stabilizationWindowSeconds: 300 # 5-minute cooldown
policies:
- type: Percent
value: 10 # Remove at most 10% per period
periodSeconds: 60
Choosing the Right Metric
| Workload Type | Best Metric | Why | |---------------|-------------|-----| | Web API | Requests per second | Directly measures load | | Worker/Consumer | Queue depth | Measures pending work | | WebSocket server | Active connections | Measures concurrent users | | ML inference | Inference queue latency | Measures user-facing performance | | Database proxy | Connection pool utilization | Measures capacity pressure |
Debugging Custom Metrics HPA
# Check HPA status
kubectl get hpa api-server-hpa -n production
kubectl describe hpa api-server-hpa -n production
# Verify custom metric exists
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
# Check specific metric value
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .
# Check external metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/production/sqs_queue_depth" | jq .
# HPA events
kubectl describe hpa api-server-hpa | grep -A 10 Events
Common Pitfalls
- Metric not found: The adapter configuration does not match the Prometheus metric name
- Stale metrics: Prometheus scrape interval is too long, causing delayed scaling
- Metric averaging issues: HPA averages across Pods — if some Pods are idle (just started), the average is misleading
- Scale-down too aggressive: Without stabilization windows, transient metric drops cause unnecessary scale-down
- Ignoring CPU as fallback: Always keep a CPU metric as a safety net alongside custom metrics
Why Interviewers Ask This
CPU and memory utilization often do not correlate with actual application load. Custom metrics autoscaling shows you can design scaling strategies that match real-world demand patterns.
Common Follow-Up Questions
Key Takeaways
- Custom metrics let HPA scale on business-relevant signals like RPS, latency, or queue depth.
- A metrics adapter (like Prometheus Adapter) bridges your monitoring system and the Kubernetes metrics API.
- External metrics support scaling based on metrics outside the cluster (cloud queue depth, external API latency).