Kubernetes Readiness Probe Failed

Causes and Fixes

A readiness probe failure means the kubelet has determined that a container is not ready to accept traffic. Unlike liveness probe failures which trigger restarts, readiness failures cause the pod to be removed from Service endpoints so it stops receiving traffic. The pod continues running but is marked as not ready until the probe passes again.

Symptoms

  • Pod shows Running but READY column shows 0/1
  • kubectl describe pod shows 'Readiness probe failed' events
  • Service endpoints list is empty or missing pods
  • Traffic is not reaching the pod even though it is running
  • Intermittent 502/503 errors from Services or Ingress as pods cycle between ready and not ready

Common Causes

1
Application dependency not available
The readiness probe checks a dependency (database, cache, external API) that is down or unreachable. The probe correctly reports the pod as not ready to serve requests.
2
Application still initializing
The application is performing startup tasks (loading cache, warming connections, running migrations) and is not yet ready to handle requests. Without a startup probe, the readiness probe reports not-ready during this period.
3
Wrong probe endpoint or port
The readiness probe is configured to check a URL path or port that does not exist or returns errors, causing the probe to fail permanently.
4
Resource contention causing slow responses
The container is under heavy CPU or memory pressure, causing the readiness endpoint to respond slower than the probe timeout.
5
Misconfigured health endpoint
The application's readiness endpoint returns an error status code (4xx or 5xx) due to a bug in the health check implementation or a configuration issue.
6
Network configuration issue
The probe cannot reach the container's port due to network policy restrictions, CNI issues, or the container binding to the wrong network interface.

Step-by-Step Troubleshooting

Readiness probe failures affect traffic routing — pods that are not ready do not receive traffic from Services. This is by design but can cause outages if all pods fail readiness simultaneously. This guide walks through diagnosing why readiness is failing and restoring traffic flow.

1. Check Pod Status and Events

Confirm the pod is running but not ready.

kubectl get pod <pod-name>

# READY column will show 0/1 (or N-1/N for multi-container pods)

kubectl describe pod <pod-name>

Look at:

  • Conditions: The Ready condition will be False
  • Events: Readiness probe failure messages with specific error details
# Get the Ready condition programmatically
kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'

2. Examine the Probe Configuration

Understand exactly what the readiness probe checks.

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

Note the probe mechanism (httpGet, tcpSocket, exec), endpoint, port, and timing parameters.

3. Test the Readiness Endpoint

Manually test what the readiness probe checks.

# For HTTP probes
kubectl exec <pod-name> -- curl -v http://localhost:<port><path>

# Example
kubectl exec <pod-name> -- curl -v http://localhost:8080/ready

# For TCP probes
kubectl exec <pod-name> -- sh -c 'echo > /dev/tcp/localhost/<port> && echo "OK" || echo "FAIL"'

# For exec probes
kubectl exec <pod-name> -- <probe-command>

Check the HTTP status code. Readiness probes consider any status 200-399 as success, and 400+ as failure.

4. Check Application Logs

The application logs will reveal why the readiness endpoint is returning failure.

kubectl logs <pod-name> --tail=100

# If the container has been running a while, look at recent logs
kubectl logs <pod-name> --since=5m

Look for:

  • Dependency connection errors (database, cache, message queue)
  • Initialization or migration failures
  • Health check endpoint error messages
  • Resource exhaustion warnings

5. Check Dependencies

If the readiness probe checks external dependencies, verify they are accessible.

# From inside the pod, test database connectivity
kubectl exec <pod-name> -- nc -zv <db-host> <db-port>

# Test Redis connectivity
kubectl exec <pod-name> -- nc -zv <redis-host> 6379

# Test external API
kubectl exec <pod-name> -- curl -s -o /dev/null -w "%{http_code}" http://<dependency-url>/healthz

If dependencies are down, the readiness probe is working correctly — it is protecting the Service from routing traffic to pods that cannot serve it.

6. Check Resource Pressure

Resource contention can cause readiness probes to time out.

# Check container resource usage
kubectl top pod <pod-name> --containers

# Check resource requests and limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq .

If the container is CPU-throttled, the readiness endpoint may respond slowly. Either increase resource limits or increase the probe timeout.

7. Check the Impact on Service Endpoints

Verify how the readiness failure affects traffic routing.

# Check service endpoints
kubectl get endpoints <service-name>

# Check how many pods are ready vs total
kubectl get pods -l <selector> -o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,IP:.status.podIP

If all pods fail readiness, the Service has zero endpoints and all traffic fails. This is a full outage even though pods are running.

8. Adjust Probe Parameters

If the probe is too strict for the application's behavior, relax the parameters.

kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "<container-name>",
          "readinessProbe": {
            "httpGet": {
              "path": "/ready",
              "port": 8080
            },
            "initialDelaySeconds": 10,
            "periodSeconds": 10,
            "timeoutSeconds": 5,
            "failureThreshold": 3,
            "successThreshold": 1
          }
        }]
      }
    }
  }
}'

For readiness probes, successThreshold can be set higher than 1 (unlike liveness probes where it is always 1). This requires multiple consecutive successes before marking the pod as ready again, which helps with flapping.

9. Distinguish Readiness From Liveness

Readiness and liveness probes should often check different things.

# Liveness: Is the application process alive?
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  periodSeconds: 15
  failureThreshold: 3

# Readiness: Can the application serve traffic?
readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  periodSeconds: 10
  failureThreshold: 3

The liveness endpoint (/healthz) should check only internal process health. The readiness endpoint (/ready) can check dependencies and initialization state — things that might be temporarily unavailable but do not require a container restart to fix.

10. Verify Readiness Is Restored

After fixing the issue, confirm the pod becomes ready and starts receiving traffic.

# Watch the pod status
kubectl get pod <pod-name> -w

# Check the Ready condition
kubectl get pod <pod-name> -o jsonpath='{.status.conditions[?(@.type=="Ready")]}'

# Verify endpoints are populated
kubectl get endpoints <service-name>

# Test the readiness endpoint manually
kubectl exec <pod-name> -- curl -s http://localhost:<port><readiness-path>

# Verify no more readiness probe failures in events
kubectl describe pod <pod-name> | grep -i "readiness\|unhealthy"

The pod is fully recovered when the READY column shows 1/1 (or N/N), the pod's IP appears in the Service endpoints, and the readiness probe consistently passes. Monitor for readiness flapping (rapidly alternating between ready and not-ready) which indicates an unstable dependency or marginal probe timeouts.

How to Explain This in an Interview

I would explain that readiness probes serve a fundamentally different purpose from liveness probes: they control traffic flow, not container lifecycle. A pod that fails readiness is removed from Service endpoints, meaning no new traffic is sent to it, but the container keeps running. This is ideal for scenarios where the pod is temporarily unable to serve (dependency outage, high load, initialization). I'd discuss how readiness probe failures interact with rolling updates — a new pod that never becomes ready will stall the rollout (eventually hitting ProgressDeadlineExceeded). I'd cover the importance of readiness probes for graceful deployment and how they should be distinct from liveness probes, potentially checking different things.

Prevention

  • Design readiness endpoints to check only the conditions that prevent serving traffic
  • Use startup probes for initialization instead of relying on initialDelaySeconds
  • Set appropriate timeout and failure thresholds based on real-world performance
  • Monitor the ratio of ready to total pods as a service health indicator
  • Test readiness probes under load conditions before deploying to production

Related Errors