How Do You Troubleshoot Kubernetes Deployment Issues?
Troubleshooting Deployment issues involves checking rollout status, inspecting Pod events, reviewing container logs, and verifying resource availability. Common problems include image pull errors, crashlooping containers, insufficient resources, and failed health checks.
Detailed Answer
When a Deployment is not behaving as expected, a systematic approach helps you identify the root cause quickly. Here is a step-by-step troubleshooting methodology.
Step 1: Check Rollout Status
kubectl rollout status deployment/web
# Output examples:
# "deployment "web" successfully rolled out"
# "Waiting for deployment "web" rollout to finish: 1 out of 3 new replicas have been updated..."
# "error: deployment "web" exceeded its progress deadline"
If the rollout is stuck, the progressDeadlineSeconds (default 600s) may have been exceeded.
Step 2: Inspect the Deployment
kubectl describe deployment web
Key sections to examine:
- Conditions: Look for
Available,Progressing, andReplicaFailure - Events: Shows scaling decisions and errors
- Replicas: Compare desired, updated, ready, and available counts
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing False ProgressDeadlineExceeded
Step 3: Check ReplicaSets
kubectl get rs -l app=web
# NAME DESIRED CURRENT READY AGE
# web-abc123 3 3 3 2d (old - stable)
# web-def456 3 3 0 5m (new - not ready)
If the new ReplicaSet has Pods that are not ready, drill into those Pods.
Step 4: Inspect Pods
kubectl get pods -l app=web
kubectl describe pod web-def456-xyz
Common Pod Error States
ImagePullBackOff / ErrImagePull
Events:
Warning Failed kubelet Failed to pull image "myapp:latest": rpc error
Warning Failed kubelet Error: ImagePullBackOff
Causes and fixes:
- Wrong image name or tag → verify image exists in the registry
- Missing imagePullSecrets → create and attach the secret
- Private registry authentication → check the docker-registry secret
# Verify image exists
docker manifest inspect myapp:2.0
# Check imagePullSecrets
kubectl get pod web-xyz -o jsonpath='{.spec.imagePullSecrets}'
# Create a pull secret
kubectl create secret docker-registry regcred \
--docker-server=registry.example.com \
--docker-username=user \
--docker-password=pass
CrashLoopBackOff
Events:
Warning BackOff kubelet Back-off restarting failed container
Debug steps:
# View the crash logs (--previous shows logs from the last terminated container)
kubectl logs web-xyz --previous
# Check if it was OOMKilled
kubectl describe pod web-xyz | grep -A 3 "Last State"
# Last State: Terminated
# Reason: OOMKilled
# Exit Code: 137
# Try running the container interactively
kubectl run debug --image=myapp:2.0 --rm -it -- /bin/sh
Pending Pods
kubectl describe pod web-xyz
# Events:
# Warning FailedScheduling 0/5 nodes available:
# 3 Insufficient cpu, 2 node(s) had taint NoSchedule
Common causes:
- Insufficient cluster resources → check node allocatable vs. requests
- Taints without matching tolerations → check node taints
- PVC not bound → check PV availability
- Node selector or affinity mismatch → verify node labels
# Check node resources
kubectl describe nodes | grep -A 5 "Allocated resources"
# Check node taints
kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.taints}{"\n"}{end}'
Step 5: Check Health Probes
Failed readiness probes prevent Pods from receiving traffic and stall rollouts:
kubectl describe pod web-xyz | grep -A 10 "Readiness"
# Readiness probe failed: HTTP probe failed with statuscode: 503
Fixes:
- Increase
initialDelaySecondsif the app needs time to start - Check the probe endpoint actually returns 200
- Verify the probe port matches the container port
Step 6: Review Events Cluster-Wide
# All events sorted by time
kubectl get events --sort-by='.lastTimestamp' -A
# Events for a specific namespace
kubectl get events -n production --field-selector reason=FailedScheduling
Deployment Troubleshooting Flowchart
Deployment issue
├── kubectl rollout status → Stuck?
│ ├── Yes → Check new ReplicaSet Pods
│ │ ├── Pending → Resource/scheduling issue
│ │ ├── CrashLoopBackOff → Check logs --previous
│ │ ├── ImagePullBackOff → Check image name/secrets
│ │ └── Running but not Ready → Check readiness probe
│ └── No → Rollout succeeded, issue is elsewhere
├── Wrong version running?
│ └── Check image tag on running Pods
└── Pods running but not receiving traffic?
└── Check Service selector matches Pod labels
Useful Troubleshooting Commands Summary
# Quick health overview
kubectl get deployment,rs,pods -l app=web
# Rollout history
kubectl rollout history deployment web
# Rollback if needed
kubectl rollout undo deployment web
# Watch Pods in real-time
kubectl get pods -l app=web -w
# Get YAML of a running Pod for comparison
kubectl get pod web-xyz -o yaml
Why Interviewers Ask This
Deployment troubleshooting is one of the most practical skills tested in interviews. It demonstrates your ability to systematically diagnose production issues under pressure.
Common Follow-Up Questions
Key Takeaways
- Always start with kubectl rollout status, then drill down into describe and logs.
- ImagePullBackOff and CrashLoopBackOff are the two most common Pod failure patterns.
- A stuck rollout is often caused by readiness probe failures on the new ReplicaSet's Pods.