Kubernetes MinimumReplicasUnavailable

Causes and Fixes

MinimumReplicasUnavailable is a Deployment condition indicating that the number of available replicas has fallen below the minimum required threshold. The Deployment's Available condition is set to False with this reason, meaning the application may not be serving traffic reliably.

Symptoms

  • Deployment condition Available is False with reason MinimumReplicasUnavailable
  • kubectl get deployment shows fewer ready replicas than desired
  • Service endpoints are reduced, causing degraded performance or outages
  • HorizontalPodAutoscaler reports unable to maintain minimum replicas
  • Alerts fire for application availability dropping below threshold

Common Causes

1
Pods crashing or failing health checks
One or more pods are in CrashLoopBackOff or failing readiness probes, reducing the count of available replicas below the minimum.
2
Node failures or evictions
Nodes have gone down or are under resource pressure, evicting pods faster than the scheduler can reschedule them.
3
Insufficient cluster resources
The cluster does not have enough capacity to schedule all requested replicas, leaving some pods in Pending state.
4
Rolling update in progress with tight surge settings
A rolling update with maxUnavailable set too high or maxSurge too low temporarily reduces available replicas below the minimum.
5
PodDisruptionBudget conflict
Voluntary disruptions (node drain, cluster autoscaler) are removing pods, and the PDB allows more disruption than expected.

Step-by-Step Troubleshooting

When a Deployment shows MinimumReplicasUnavailable, it means fewer pods are available than required. This directly impacts your application's ability to handle traffic. This guide walks through identifying why replicas are unavailable and restoring full availability.

1. Check Deployment Status

Start by understanding the current state of the Deployment.

kubectl get deployment <deployment-name> -o wide

# Check detailed conditions
kubectl describe deployment <deployment-name>

Note the READY column output, which shows the ratio of ready to desired replicas (for example, 1/3 means only 1 of 3 desired replicas is ready). In the describe output, look at the Conditions section for both Available and Progressing conditions.

# Get conditions programmatically
kubectl get deployment <deployment-name> -o jsonpath='{range .status.conditions[*]}{.type}: {.status} ({.reason}){"\n"}{end}'

2. List All Pods for the Deployment

Identify which pods are not ready and why.

kubectl get pods -l app=<app-label> -o wide

# Get a quick summary of pod states
kubectl get pods -l app=<app-label> -o custom-columns=NAME:.metadata.name,STATUS:.status.phase,READY:.status.conditions[?(@.type=="Ready")].status,RESTARTS:.status.containerStatuses[0].restartCount,NODE:.spec.nodeName

Categorize the pods by their status: Running and Ready, Running but not Ready, Pending, or Failed. Each category requires a different troubleshooting approach.

3. Investigate Pods That Are Not Ready

For pods that are Running but not Ready, check why readiness probes are failing.

kubectl describe pod <not-ready-pod-name>

# Check readiness probe configuration
kubectl get pod <not-ready-pod-name> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

# Check application logs for health check failures
kubectl logs <not-ready-pod-name> --tail=50

# Test the readiness endpoint manually
kubectl exec <not-ready-pod-name> -- curl -s localhost:8080/ready

If the pod is in CrashLoopBackOff, check the previous container's logs.

kubectl logs <crashing-pod-name> --previous

4. Investigate Pending Pods

For pods stuck in Pending, the issue is scheduling.

kubectl describe pod <pending-pod-name>

# Check events for scheduling failures
kubectl get events --field-selector involvedObject.name=<pending-pod-name> --sort-by=.lastTimestamp

Common reasons include insufficient CPU or memory, node affinity or anti-affinity rules that cannot be satisfied, taints with no matching tolerations, or PersistentVolumeClaims that cannot be bound.

# Check cluster resource availability
kubectl top nodes
kubectl describe nodes | grep -A10 "Allocated resources"

# Check if there are unbound PVCs
kubectl get pvc -l app=<app-label>

5. Check for Node Issues

Pods may be unavailable because their nodes have problems.

# List node statuses
kubectl get nodes

# Check for nodes that are NotReady
kubectl get nodes | grep NotReady

# For each problematic node, check conditions
kubectl describe node <node-name> | grep -A20 "Conditions:"

If nodes are under memory pressure, disk pressure, or PID pressure, they may be evicting pods. If a node is NotReady, the pods on it will be marked for termination after the pod-eviction-timeout (default 5 minutes).

6. Review the Rolling Update Strategy

If this is happening during a deployment update, the rolling update strategy may be too aggressive.

kubectl get deployment <deployment-name> -o jsonpath='{.spec.strategy}' | jq .

Check maxUnavailable and maxSurge values. If maxUnavailable is set to a percentage or number that allows too many pods to be taken down simultaneously, the available count drops below the minimum.

# Adjust the strategy to be more conservative
kubectl patch deployment <deployment-name> -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":"0","maxSurge":"1"}}}}'

Setting maxUnavailable to 0 ensures that no old pods are terminated until new ones are ready, preventing any drop in availability during updates.

7. Check PodDisruptionBudgets

PodDisruptionBudgets should protect your minimum availability, but they only cover voluntary disruptions.

# List PDBs in the namespace
kubectl get pdb

# Check the PDB for your application
kubectl describe pdb <pdb-name>

If no PDB exists, create one to protect against voluntary disruptions like node drains and cluster autoscaler scale-downs.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app
kubectl apply -f pdb.yaml

8. Scale Up or Free Resources

If the issue is insufficient resources, you have several options.

# Scale down other less critical workloads
kubectl scale deployment <low-priority-deployment> --replicas=0

# If using cluster autoscaler, check its status
kubectl get pods -n kube-system | grep autoscaler
kubectl logs -n kube-system <autoscaler-pod> --tail=50

# Manually add a node if autoscaling is not available
# (cloud-provider specific)

Alternatively, reduce the resource requests of the affected Deployment if they are overprovisioned.

kubectl set resources deployment <deployment-name> --requests=cpu=100m,memory=128Mi

9. Restore Availability Quickly

If you need immediate recovery while investigating the root cause, consider scaling up.

# Scale up to compensate for unavailable replicas
kubectl scale deployment <deployment-name> --replicas=<higher-number>

# If the issue is a bad rollout, roll back
kubectl rollout undo deployment/<deployment-name>
kubectl rollout status deployment/<deployment-name>

10. Verify Full Recovery

Confirm that all replicas are available and the Deployment is healthy.

# Check deployment status
kubectl get deployment <deployment-name>

# Verify the Available condition is True
kubectl get deployment <deployment-name> -o jsonpath='{.status.conditions[?(@.type=="Available")]}'

# Confirm all pods are running and ready
kubectl get pods -l app=<app-label>

# Check service endpoints are populated
kubectl get endpoints <service-name>

The Deployment is fully recovered when the ready replicas equal the desired replicas, the Available condition is True with reason MinimumReplicasAvailable, and all service endpoints are populated. Continue monitoring for recurring issues that might cause replicas to drop again.

How to Explain This in an Interview

I would explain that MinimumReplicasUnavailable is tied to the Deployment's minReadySeconds and the Available condition. A pod is considered available only after it has been ready for at least minReadySeconds. I'd discuss how this differs from the Progressing condition (which tracks rollout progress), how the Deployment controller calculates available replicas, and how maxUnavailable in the rolling update strategy interacts with this. I'd also cover how to use PodDisruptionBudgets to prevent voluntary disruptions from violating availability requirements.

Prevention

  • Set appropriate resource requests to ensure pods can be scheduled
  • Configure PodDisruptionBudgets to protect minimum availability
  • Use topology spread constraints to distribute replicas across failure domains
  • Monitor replica availability and set alerts before reaching critical thresholds
  • Size clusters with headroom for rescheduling during failures

Related Errors