How Does Custom Metrics Autoscaling Work in Kubernetes?

Q: How Does Custom Metrics Autoscaling Work in Kubernetes?

Custom metrics autoscaling configures the HPA to scale based on application-specific metrics like requests per second, queue depth, or business metrics instead of just CPU and memory. It requires a metrics adapter that exposes custom metrics through the Kubernetes metrics API.

Detailed Answer

The default HPA scales on CPU and memory utilization, but these metrics often poorly represent actual application load. Custom metrics autoscaling lets you scale on application-specific signals like request rate, response latency, queue depth, or active connections.

Kubernetes Metrics APIs

The HPA queries three different metrics APIs:

| API | Source | Use Case | |-----|--------|----------| | metrics.k8s.io | metrics-server | CPU and memory utilization | | custom.metrics.k8s.io | Metrics adapter | Per-Pod or per-object application metrics | | external.metrics.k8s.io | Metrics adapter | External system metrics (cloud queues, etc.) |

Architecture

Application → Prometheus → Prometheus Adapter → Kubernetes Metrics API → HPA
   (exposes      (scrapes      (translates to       (standard          (scales
    /metrics)     metrics)      k8s format)          metrics API)       replicas)

Setting Up Prometheus Adapter

helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus.monitoring:9090

Configure the adapter to expose specific metrics:

# prometheus-adapter ConfigMap
rules:
  # Expose http_requests_per_second as a Pod metric
  - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_total$"
      as: "${1}_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

  # Expose active_connections as a Pod metric
  - seriesQuery: 'active_connections{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      as: "active_connections"
    metricsQuery: 'avg(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

Verify the custom metric is available:

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second"

HPA with Custom Metrics

Scale on Requests Per Second

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 50
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"     # Target: 100 RPS per Pod

When average RPS across Pods exceeds 100, the HPA adds replicas. When it drops below, replicas are removed.

Scale on Multiple Metrics

spec:
  metrics:
    # CPU (keep as a safety net)
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

    # Request rate (primary scaling signal)
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: "100"

    # Response latency (scale up when slow)
    - type: Pods
      pods:
        metric:
          name: http_request_duration_p99_milliseconds
        target:
          type: AverageValue
          averageValue: "500"     # Target: p99 < 500ms

When multiple metrics are defined, the HPA calculates the desired replica count for each metric and uses the highest value.

External Metrics

External metrics are not tied to Kubernetes objects. They are useful for scaling based on cloud service metrics:

spec:
  metrics:
    - type: External
      external:
        metric:
          name: sqs_queue_depth
          selector:
            matchLabels:
              queue: orders
        target:
          type: AverageValue
          averageValue: "20"     # Target: 20 messages per Pod

This requires the metrics adapter to expose SQS queue depth as an external metric.

Object Metrics

Object metrics reference a specific Kubernetes object:

spec:
  metrics:
    - type: Object
      object:
        describedObject:
          apiVersion: networking.k8s.io/v1
          kind: Ingress
          name: api-ingress
        metric:
          name: requests_per_second
        target:
          type: Value
          value: "2000"     # Scale when Ingress receives > 2000 RPS

Scaling Behavior Configuration

Fine-tune scaling speed to prevent flapping:

spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
        - type: Percent
          value: 100           # Can double replicas per period
          periodSeconds: 60
        - type: Pods
          value: 5             # Or add 5 Pods per period
          periodSeconds: 60
      selectPolicy: Max        # Use whichever adds more replicas

    scaleDown:
      stabilizationWindowSeconds: 300    # 5-minute cooldown
      policies:
        - type: Percent
          value: 10            # Remove at most 10% per period
          periodSeconds: 60

Choosing the Right Metric

| Workload Type | Best Metric | Why | |---------------|-------------|-----| | Web API | Requests per second | Directly measures load | | Worker/Consumer | Queue depth | Measures pending work | | WebSocket server | Active connections | Measures concurrent users | | ML inference | Inference queue latency | Measures user-facing performance | | Database proxy | Connection pool utilization | Measures capacity pressure |

Debugging Custom Metrics HPA

# Check HPA status
kubectl get hpa api-server-hpa -n production
kubectl describe hpa api-server-hpa -n production

# Verify custom metric exists
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

# Check specific metric value
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/production/pods/*/http_requests_per_second" | jq .

# Check external metrics
kubectl get --raw "/apis/external.metrics.k8s.io/v1beta1/namespaces/production/sqs_queue_depth" | jq .

# HPA events
kubectl describe hpa api-server-hpa | grep -A 10 Events

Common Pitfalls

Metric not found: The adapter configuration does not match the Prometheus metric name
Stale metrics: Prometheus scrape interval is too long, causing delayed scaling
Metric averaging issues: HPA averages across Pods — if some Pods are idle (just started), the average is misleading
Scale-down too aggressive: Without stabilization windows, transient metric drops cause unnecessary scale-down
Ignoring CPU as fallback: Always keep a CPU metric as a safety net alongside custom metrics