What Is the Vertical Pod Autoscaler (VPA)?

intermediate|autoscalingdevopssreplatform engineerCKA
TL;DR

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for containers based on actual usage. It right-sizes Pods to match their real resource needs, reducing waste from over-provisioning and preventing OOMKills from under-provisioning.

Detailed Answer

The Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests (and optionally limits) of containers based on historical and real-time resource usage. Unlike HPA which scales horizontally (more Pods), VPA scales vertically (bigger Pods).

The Problem VPA Solves

Most teams set resource requests based on guesses:

resources:
  requests:
    cpu: "500m"      # Guess: maybe it needs half a core?
    memory: "512Mi"  # Guess: probably enough?

In reality:

  • Over-provisioned: Pod uses 50m CPU but requests 500m → 90% wasted
  • Under-provisioned: Pod uses 800Mi memory but has 512Mi limit → OOMKilled

VPA observes actual usage and recommends or applies correct values.

VPA Components

| Component | Role | |-----------|------| | Recommender | Analyzes metrics and generates recommendations | | Updater | Evicts Pods that need resizing (in Auto mode) | | Admission Controller | Sets resource requests on new Pods |

Installation

# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA
./hack/vpa-up.sh

# Verify
kubectl get pods -n kube-system | grep vpa

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"    # Start with recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "4"
          memory: "8Gi"
        controlledResources:
          - cpu
          - memory

Update Modes

| Mode | Behavior | Disruption | |------|----------|------------| | Off | Generate recommendations only | None | | Initial | Set requests only on Pod creation | None for running Pods | | Auto | Evict and recreate Pods with new requests | Pod restarts |

Reading VPA Recommendations

kubectl get vpa api-server-vpa -o yaml
status:
  recommendation:
    containerRecommendations:
      - containerName: api
        lowerBound:
          cpu: "125m"
          memory: "262144k"
        target:
          cpu: "250m"
          memory: "524288k"
        uncappedTarget:
          cpu: "250m"
          memory: "524288k"
        upperBound:
          cpu: "1"
          memory: "2Gi"

| Field | Meaning | |-------|---------| | target | Recommended request (capped by min/max) | | lowerBound | Conservative recommendation (95th percentile of lows) | | upperBound | Peak recommendation (near max observed) | | uncappedTarget | What VPA would recommend without min/max constraints |

VPA with HPA: The Conflict

VPA and HPA both watch resource utilization but take opposite actions:

High CPU → HPA adds replicas (horizontal)
         → VPA increases CPU request (vertical)

If both act on CPU, they fight each other. Solutions:

  1. VPA for memory, HPA for CPU: Use controlledResources: [memory] in VPA
  2. VPA in Off mode with HPA: Use VPA only for recommendations
  3. Multidimensional Pod Autoscaler (MPA): Combines VPA and HPA (experimental)
# VPA only controls memory, HPA controls CPU-based replicas
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources:
          - memory         # Only manage memory
        minAllowed:
          memory: "128Mi"
        maxAllowed:
          memory: "4Gi"

Goldilocks: VPA Recommendations Dashboard

Goldilocks by Fairwinds creates VPA objects for every Deployment and provides a dashboard:

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks --create-namespace

# Enable for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# View recommendations
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

In-Place Pod Vertical Scaling (Future)

The InPlacePodVerticalScaling feature (alpha) will allow VPA to resize containers without Pod restarts:

# Future: resize without eviction
spec:
  containers:
    - name: api
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: RestartContainer

Best Practices

  1. Start with Off mode — review recommendations for 1-2 weeks before enabling Auto
  2. Set min and max bounds — prevent VPA from setting unreasonably small or large requests
  3. Use PDBs — ensure VPA evictions do not take all replicas down simultaneously
  4. Exclude sidecar containers — sidecars often have stable resource needs
  5. Monitor recommendation stability — if recommendations fluctuate wildly, the workload may need HPA instead
  6. Right-size first, then autoscale — use VPA recommendations to set initial requests, then add HPA

Why Interviewers Ask This

Resource requests are often guessed and rarely updated. VPA demonstrates your understanding of resource efficiency and automated right-sizing, which directly impacts cluster costs.

Common Follow-Up Questions

Can VPA and HPA run together?
Not on the same resource (CPU). VPA adjusts requests while HPA adjusts replicas based on utilization. They conflict on CPU but can work together if HPA uses custom metrics and VPA manages CPU/memory.
What are the three VPA update modes?
Off (recommendations only), Initial (set requests at Pod creation), and Auto (evict and recreate Pods with updated requests). Off mode is safest for production.
Does VPA require Pod restarts?
Currently yes — in Auto mode, VPA evicts Pods and they are recreated with new requests. In-place resize (InPlacePodVerticalScaling) is being developed to avoid restarts.

Key Takeaways

  • VPA automatically right-sizes Pod resource requests based on actual usage patterns.
  • Use VPA in 'Off' mode first to get recommendations without disrupting workloads.
  • VPA and HPA should not both scale on CPU — use one for right-sizing and the other for replica scaling.

Related Questions

You Might Also Like