The HPA automatically scales the number of Pod replicas based on observed CPU, memory, or custom metrics. It periodically queries the Metrics API, computes the desired replica count using a target utilization formula, and updates the Deployment or StatefulSet accordingly.
Autoscaling Interview Questions
Why Autoscaling Matters in Interviews
Autoscaling directly impacts both cost efficiency and application reliability, making it a high-value interview topic. Organizations need engineers who can configure scaling policies that respond to real-world traffic patterns without wasting resources or degrading user experience.
Interviewers often start with HPA fundamentals: "How does the HPA calculate the desired replica count?" Candidates should know the formula (desiredReplicas = ceil(currentReplicas * currentMetricValue / desiredMetricValue)) and understand the default sync period and stabilization behavior. Follow-up questions explore custom metrics: "How would you scale based on requests per second instead of CPU?"
The interaction between HPA and VPA is a common advanced question — they cannot both manage the same resource dimension simultaneously. Candidates should explain when to use each and how they complement one another.
Cluster Autoscaler questions focus on the relationship between Pod resource requests, node capacity, and scale-up triggers. Understanding why Pods are Pending (insufficient resources vs. affinity constraints) and how the Cluster Autoscaler decides which node group to expand is critical.
PDB questions round out the topic: "How do you ensure a rolling update or node drain does not take your service below minimum availability?" Being able to connect PDBs to both autoscaling and maintenance operations demonstrates comprehensive understanding.