How Do You Monitor and Troubleshoot a Deployment Rollout?
Use kubectl rollout status to watch a rollout in real time. Combine it with kubectl describe deployment and kubectl get events to diagnose stuck or failed rollouts caused by image pull errors, resource limits, or failing health checks.
Detailed Answer
When a deployment rollout does not go as planned, you need to quickly determine what went wrong and decide whether to fix forward or roll back. Kubernetes provides several tools for monitoring and troubleshooting rollouts.
Monitoring a Rollout in Real Time
# Watch the rollout progress
kubectl rollout status deployment/web-app
# Output during a healthy rollout:
# Waiting for deployment "web-app" rollout to finish: 1 of 3 updated replicas are available...
# Waiting for deployment "web-app" rollout to finish: 2 of 3 updated replicas are available...
# deployment "web-app" successfully rolled out
The command exits with code 0 on success or non-zero on failure. This makes it ideal for CI/CD pipelines:
kubectl apply -f deployment.yaml
kubectl rollout status deployment/web-app --timeout=300s || {
echo "Rollout failed! Rolling back..."
kubectl rollout undo deployment/web-app
exit 1
}
Deployment Conditions
Kubernetes tracks three conditions on every Deployment:
kubectl get deployment web-app -o jsonpath='{.status.conditions[*]}' | jq .
| Condition | Meaning |
|---|---|
| Available | Minimum required Pods are ready and have been available for minReadySeconds. |
| Progressing | The rollout is making progress (creating or deleting Pods). |
| ReplicaFailure | The controller could not create new Pods (quota exceeded, invalid spec, etc.). |
A healthy Deployment has Available=True and Progressing=True. A stuck rollout typically shows Progressing=True with reason ReplicaSetUpdated but no new Pods becoming Ready.
Diagnosing a Stuck Rollout
Step 1 -- Check Deployment status
kubectl describe deployment web-app
Look at the Conditions and Events sections:
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True ReplicaSetUpdated
Events:
Type Reason Age Message
---- ------ ---- -------
Normal ScalingReplicaSet 2m Scaled up replica set web-app-8d9f7e0b2 to 1
Step 2 -- Check ReplicaSet status
kubectl get replicasets -l app=web-app
NAME DESIRED CURRENT READY AGE
web-app-7c8e6d9a1 3 3 3 1d # old, still running
web-app-8d9f7e0b2 1 1 0 2m # new, not ready
The new ReplicaSet has 1 Pod created but 0 Ready -- the Pod is failing.
Step 3 -- Check the failing Pod
# Find the new Pod
kubectl get pods -l app=web-app --sort-by=.metadata.creationTimestamp
# Describe the failing Pod
kubectl describe pod web-app-8d9f7e0b2-xyz99
# Check container logs
kubectl logs web-app-8d9f7e0b2-xyz99
Common Rollout Failure Causes
Image Pull Errors
Events:
Warning Failed 1m kubelet Failed to pull image "web-app:typo": ...
Warning Failed 1m kubelet Error: ImagePullBackOff
Fix: Correct the image name or tag, ensure the image exists, verify imagePullSecrets.
Insufficient Resources
Events:
Warning FailedScheduling 1m default-scheduler 0/5 nodes are available:
5 Insufficient cpu.
Fix: Reduce resource requests, add nodes, or scale down other workloads.
Failing Readiness Probe
Events:
Warning Unhealthy 1m kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Fix: Check application startup, verify the probe endpoint path and port, increase initialDelaySeconds.
CrashLoopBackOff
Events:
Warning BackOff 1m kubelet Back-off restarting failed container
Fix: Check kubectl logs for application errors. Common causes: missing environment variables, config map errors, database connection failures.
Using progressDeadlineSeconds
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
progressDeadlineSeconds: 300
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: web-app:2.0
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /healthz
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
If no progress is made for 300 seconds, the Deployment condition changes:
Conditions:
Type Status Reason
---- ------ ------
Progressing False ProgressDeadlineExceeded
Kubernetes does not automatically roll back. The failed condition is a signal for external automation or alerting.
Pausing and Resuming Rollouts
# Pause the rollout (useful for making multiple changes)
kubectl rollout pause deployment/web-app
# Make several changes without triggering multiple rollouts
kubectl set image deployment/web-app web-app=web-app:2.1
kubectl set env deployment/web-app LOG_LEVEL=debug
kubectl set resources deployment/web-app -c web-app --limits=cpu=500m,memory=512Mi
# Resume to trigger a single rollout with all changes
kubectl rollout resume deployment/web-app
CI/CD Integration Pattern
#!/bin/bash
set -euo pipefail
DEPLOYMENT="web-app"
TIMEOUT="300s"
echo "Applying deployment..."
kubectl apply -f deployment.yaml
echo "Waiting for rollout to complete..."
if ! kubectl rollout status deployment/${DEPLOYMENT} --timeout=${TIMEOUT}; then
echo "FAILED: Rollout did not complete within ${TIMEOUT}"
echo "Deployment status:"
kubectl get deployment ${DEPLOYMENT} -o wide
echo "Pod status:"
kubectl get pods -l app=${DEPLOYMENT} --sort-by=.metadata.creationTimestamp
echo "Recent events:"
kubectl get events --sort-by=.lastTimestamp --field-selector involvedObject.kind=Deployment,involvedObject.name=${DEPLOYMENT}
echo "Rolling back..."
kubectl rollout undo deployment/${DEPLOYMENT}
kubectl rollout status deployment/${DEPLOYMENT} --timeout=${TIMEOUT}
exit 1
fi
echo "Rollout complete."
Summary
Monitoring and troubleshooting rollouts requires a systematic approach: check the Deployment conditions, examine the new ReplicaSet, then inspect the failing Pods. The kubectl rollout status command is your primary monitoring tool, progressDeadlineSeconds automates failure detection, and kubectl rollout pause/resume lets you batch multiple changes into a single rollout. Building these checks into your CI/CD pipeline ensures failed deployments are caught and reverted automatically.
Why Interviewers Ask This
Debugging a stuck deployment is a common on-call task. Interviewers want to see that you have a systematic approach to diagnosing rollout failures rather than guessing.
Common Follow-Up Questions
Key Takeaways
- kubectl rollout status is the primary tool for monitoring rollout progress.
- Deployment conditions (Available, Progressing, ReplicaFailure) reveal the root cause.
- progressDeadlineSeconds automates failure detection for stuck rollouts.