What Is the Vertical Pod Autoscaler (VPA)?

Q: What Is the Vertical Pod Autoscaler (VPA)?

The Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for containers based on actual usage. It right-sizes Pods to match their real resource needs, reducing waste from over-provisioning and preventing OOMKills from under-provisioning.

Detailed Answer

The Vertical Pod Autoscaler (VPA) automatically adjusts the CPU and memory requests (and optionally limits) of containers based on historical and real-time resource usage. Unlike HPA which scales horizontally (more Pods), VPA scales vertically (bigger Pods).

The Problem VPA Solves

Most teams set resource requests based on guesses:

resources:
  requests:
    cpu: "500m"      # Guess: maybe it needs half a core?
    memory: "512Mi"  # Guess: probably enough?

In reality:

Over-provisioned: Pod uses 50m CPU but requests 500m → 90% wasted
Under-provisioned: Pod uses 800Mi memory but has 512Mi limit → OOMKilled

VPA observes actual usage and recommends or applies correct values.

VPA Components

| Component | Role | |-----------|------| | Recommender | Analyzes metrics and generates recommendations | | Updater | Evicts Pods that need resizing (in Auto mode) | | Admission Controller | Sets resource requests on new Pods |

Installation

# Clone the VPA repository
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

# Install VPA
./hack/vpa-up.sh

# Verify
kubectl get pods -n kube-system | grep vpa

VPA Configuration

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"    # Start with recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: api
        minAllowed:
          cpu: "100m"
          memory: "128Mi"
        maxAllowed:
          cpu: "4"
          memory: "8Gi"
        controlledResources:
          - cpu
          - memory

Update Modes

| Mode | Behavior | Disruption | |------|----------|------------| | Off | Generate recommendations only | None | | Initial | Set requests only on Pod creation | None for running Pods | | Auto | Evict and recreate Pods with new requests | Pod restarts |

Reading VPA Recommendations

kubectl get vpa api-server-vpa -o yaml

status:
  recommendation:
    containerRecommendations:
      - containerName: api
        lowerBound:
          cpu: "125m"
          memory: "262144k"
        target:
          cpu: "250m"
          memory: "524288k"
        uncappedTarget:
          cpu: "250m"
          memory: "524288k"
        upperBound:
          cpu: "1"
          memory: "2Gi"

| Field | Meaning | |-------|---------| | target | Recommended request (capped by min/max) | | lowerBound | Conservative recommendation (95th percentile of lows) | | upperBound | Peak recommendation (near max observed) | | uncappedTarget | What VPA would recommend without min/max constraints |

VPA with HPA: The Conflict

VPA and HPA both watch resource utilization but take opposite actions:

High CPU → HPA adds replicas (horizontal)
         → VPA increases CPU request (vertical)

If both act on CPU, they fight each other. Solutions:

VPA for memory, HPA for CPU: Use controlledResources: [memory] in VPA
VPA in Off mode with HPA: Use VPA only for recommendations
Multidimensional Pod Autoscaler (MPA): Combines VPA and HPA (experimental)

# VPA only controls memory, HPA controls CPU-based replicas
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
      - containerName: api
        controlledResources:
          - memory         # Only manage memory
        minAllowed:
          memory: "128Mi"
        maxAllowed:
          memory: "4Gi"

Goldilocks: VPA Recommendations Dashboard

Goldilocks by Fairwinds creates VPA objects for every Deployment and provides a dashboard:

helm install goldilocks fairwinds-stable/goldilocks \
  --namespace goldilocks --create-namespace

# Enable for a namespace
kubectl label namespace production goldilocks.fairwinds.com/enabled=true

# View recommendations
kubectl port-forward -n goldilocks svc/goldilocks-dashboard 8080:80

In-Place Pod Vertical Scaling (Future)

The InPlacePodVerticalScaling feature (alpha) will allow VPA to resize containers without Pod restarts:

# Future: resize without eviction
spec:
  containers:
    - name: api
      resizePolicy:
        - resourceName: cpu
          restartPolicy: NotRequired
        - resourceName: memory
          restartPolicy: RestartContainer

Best Practices

Start with Off mode — review recommendations for 1-2 weeks before enabling Auto
Set min and max bounds — prevent VPA from setting unreasonably small or large requests
Use PDBs — ensure VPA evictions do not take all replicas down simultaneously
Exclude sidecar containers — sidecars often have stable resource needs
Monitor recommendation stability — if recommendations fluctuate wildly, the workload may need HPA instead
Right-size first, then autoscale — use VPA recommendations to set initial requests, then add HPA

Detailed Answer

The Problem VPA Solves

VPA Components

Installation

VPA Configuration

Update Modes

Reading VPA Recommendations

VPA with HPA: The Conflict

Goldilocks: VPA Recommendations Dashboard

In-Place Pod Vertical Scaling (Future)

Best Practices

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like