Kubernetes Readiness Probe Failed
Causes and Fixes
A readiness probe failure means the kubelet has determined that a container is not ready to accept traffic. Unlike liveness probe failures which trigger restarts, readiness failures cause the pod to be removed from Service endpoints so it stops receiving traffic. The pod continues running but is marked as not ready until the probe passes again.
Symptoms
- Pod shows Running but READY column shows 0/1
- kubectl describe pod shows 'Readiness probe failed' events
- Service endpoints list is empty or missing pods
- Traffic is not reaching the pod even though it is running
- Intermittent 502/503 errors from Services or Ingress as pods cycle between ready and not ready
Common Causes
Step-by-Step Troubleshooting
Readiness probe failures affect traffic routing — pods that are not ready do not receive traffic from Services. This is by design but can cause outages if all pods fail readiness simultaneously. This guide walks through diagnosing why readiness is failing and restoring traffic flow.
1. Check Pod Status and Events
Confirm the pod is running but not ready.
kubectl get pod <pod-name>
# READY column will show 0/1 (or N-1/N for multi-container pods)
kubectl describe pod <pod-name>
Look at:
- Conditions: The Ready condition will be False
- Events: Readiness probe failure messages with specific error details
# Get the Ready condition programmatically
kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
2. Examine the Probe Configuration
Understand exactly what the readiness probe checks.
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .
Note the probe mechanism (httpGet, tcpSocket, exec), endpoint, port, and timing parameters.
3. Test the Readiness Endpoint
Manually test what the readiness probe checks.
# For HTTP probes
kubectl exec <pod-name> -- curl -v http://localhost:<port><path>
# Example
kubectl exec <pod-name> -- curl -v http://localhost:8080/ready
# For TCP probes
kubectl exec <pod-name> -- sh -c 'echo > /dev/tcp/localhost/<port> && echo "OK" || echo "FAIL"'
# For exec probes
kubectl exec <pod-name> -- <probe-command>
Check the HTTP status code. Readiness probes consider any status 200-399 as success, and 400+ as failure.
4. Check Application Logs
The application logs will reveal why the readiness endpoint is returning failure.
kubectl logs <pod-name> --tail=100
# If the container has been running a while, look at recent logs
kubectl logs <pod-name> --since=5m
Look for:
- Dependency connection errors (database, cache, message queue)
- Initialization or migration failures
- Health check endpoint error messages
- Resource exhaustion warnings
5. Check Dependencies
If the readiness probe checks external dependencies, verify they are accessible.
# From inside the pod, test database connectivity
kubectl exec <pod-name> -- nc -zv <db-host> <db-port>
# Test Redis connectivity
kubectl exec <pod-name> -- nc -zv <redis-host> 6379
# Test external API
kubectl exec <pod-name> -- curl -s -o /dev/null -w "%{http_code}" http://<dependency-url>/healthz
If dependencies are down, the readiness probe is working correctly — it is protecting the Service from routing traffic to pods that cannot serve it.
6. Check Resource Pressure
Resource contention can cause readiness probes to time out.
# Check container resource usage
kubectl top pod <pod-name> --containers
# Check resource requests and limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq .
If the container is CPU-throttled, the readiness endpoint may respond slowly. Either increase resource limits or increase the probe timeout.
7. Check the Impact on Service Endpoints
Verify how the readiness failure affects traffic routing.
# Check service endpoints
kubectl get endpoints <service-name>
# Check how many pods are ready vs total
kubectl get pods -l <selector> -o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,IP:.status.podIP
If all pods fail readiness, the Service has zero endpoints and all traffic fails. This is a full outage even though pods are running.
8. Adjust Probe Parameters
If the probe is too strict for the application's behavior, relax the parameters.
kubectl patch deployment <deployment-name> -p '{
"spec": {
"template": {
"spec": {
"containers": [{
"name": "<container-name>",
"readinessProbe": {
"httpGet": {
"path": "/ready",
"port": 8080
},
"initialDelaySeconds": 10,
"periodSeconds": 10,
"timeoutSeconds": 5,
"failureThreshold": 3,
"successThreshold": 1
}
}]
}
}
}
}'
For readiness probes, successThreshold can be set higher than 1 (unlike liveness probes where it is always 1). This requires multiple consecutive successes before marking the pod as ready again, which helps with flapping.
9. Distinguish Readiness From Liveness
Readiness and liveness probes should often check different things.
# Liveness: Is the application process alive?
livenessProbe:
httpGet:
path: /healthz
port: 8080
periodSeconds: 15
failureThreshold: 3
# Readiness: Can the application serve traffic?
readinessProbe:
httpGet:
path: /ready
port: 8080
periodSeconds: 10
failureThreshold: 3
The liveness endpoint (/healthz) should check only internal process health. The readiness endpoint (/ready) can check dependencies and initialization state — things that might be temporarily unavailable but do not require a container restart to fix.
10. Verify Readiness Is Restored
After fixing the issue, confirm the pod becomes ready and starts receiving traffic.
# Watch the pod status
kubectl get pod <pod-name> -w
# Check the Ready condition
kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'
# Verify endpoints are populated
kubectl get endpoints <service-name>
# Test the readiness endpoint manually
kubectl exec <pod-name> -- curl -s http://localhost:<port><readiness-path>
# Verify no more readiness probe failures in events
kubectl describe pod <pod-name> | grep -i "readiness\|unhealthy"
The pod is fully recovered when the READY column shows 1/1 (or N/N), the pod's IP appears in the Service endpoints, and the readiness probe consistently passes. Monitor for readiness flapping (rapidly alternating between ready and not-ready) which indicates an unstable dependency or marginal probe timeouts.
How to Explain This in an Interview
I would explain that readiness probes serve a fundamentally different purpose from liveness probes: they control traffic flow, not container lifecycle. A pod that fails readiness is removed from Service endpoints, meaning no new traffic is sent to it, but the container keeps running. This is ideal for scenarios where the pod is temporarily unable to serve (dependency outage, high load, initialization). I'd discuss how readiness probe failures interact with rolling updates — a new pod that never becomes ready will stall the rollout (eventually hitting ProgressDeadlineExceeded). I'd cover the importance of readiness probes for graceful deployment and how they should be distinct from liveness probes, potentially checking different things.
Prevention
- Design readiness endpoints to check only the conditions that prevent serving traffic
- Use startup probes for initialization instead of relying on initialDelaySeconds
- Set appropriate timeout and failure thresholds based on real-world performance
- Monitor the ratio of ready to total pods as a service health indicator
- Test readiness probes under load conditions before deploying to production