What causes Startup Probe Failed in Kubernetes?

Kubernetes Startup Probe Failed

Causes and Fixes

A startup probe failure means the kubelet determined that a container did not start successfully within the allowed time. The startup probe runs before liveness and readiness probes. When it fails (after failureThreshold consecutive failures), the kubelet kills the container, which typically results in CrashLoopBackOff. Startup probes were designed for slow-starting applications that need more time to initialize.

Symptoms

Container is killed before the application finishes starting
Pod events show 'Startup probe failed' followed by container restart
Pod enters CrashLoopBackOff with startup probe failure as the root cause
Application logs show incomplete initialization before the kill
Container restart count increases with 'startup probe failed' in describe output

Common Causes

Startup time exceeds the probe window

The application takes longer to start than failureThreshold x periodSeconds allows. The container is killed before initialization completes.

Wrong probe endpoint

The startup probe checks an endpoint that does not exist or is not available until after the full application initialization, which is exactly when it should be checking.

Application fails to start at all

The application encounters a fatal error during startup (missing config, database connection failure, migration error) and never becomes healthy.

Resource limits too low for startup

The container does not have enough CPU or memory for the initialization phase, which often requires more resources than steady-state operation (JVM class loading, cache warming, etc.).

Port not listening during startup

The startup probe uses a TCP or HTTP check, but the application does not open its port until very late in the initialization process.

Probe timeout too short

The startup probe's timeoutSeconds is too short for the endpoint's response time during initialization, when the application may be slower than during steady state.

Step-by-Step Troubleshooting

Startup probe failures kill containers before they finish initializing. The key diagnostic question is whether the application needs more time to start or whether it is genuinely failing during startup. This guide helps answer that question and fix the issue.

1. Check Pod Events for Startup Probe Failures

Examine the pod events to confirm the startup probe is the issue.

kubectl describe pod <pod-name>

Look for events like:

Warning  Unhealthy  Startup probe failed: HTTP probe failed with statuscode: 503
Warning  Unhealthy  Startup probe failed: Get "http://10.244.1.5:8080/healthz": dial tcp 10.244.1.5:8080: connect: connection refused

Followed by:

Normal   Killing    Container <name> failed startup probe, will be restarted

2. Check the Startup Probe Configuration

Understand the current startup budget.

kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].startupProbe}' | jq .

Calculate the total startup budget:

Total time allowed = failureThreshold x periodSeconds
Example: failureThreshold=30, periodSeconds=10 = 300 seconds (5 minutes)

If the application needs more than this time to start, the probe window is too small.

3. Check Application Logs Before the Kill

The previous container's logs show what the application was doing when it was killed.

# Check logs from the previous (killed) container
kubectl logs <pod-name> --previous --tail=100

# If the container restarted multiple times, the logs show the most recent previous

Look for:

Startup progress messages (how far did it get?)
Error messages during initialization
Slow operations (database migrations, cache loading, index building)
Missing configuration or environment variables

4. Measure Actual Startup Time

Determine how long the application actually needs to start.

# Start the application without the probe killing it (increase the window temporarily)
kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "<container-name>",
          "startupProbe": {
            "httpGet": {
              "path": "/healthz",
              "port": 8080
            },
            "periodSeconds": 10,
            "failureThreshold": 60
          }
        }]
      }
    }
  }
}'

# Watch the pod and note when it becomes ready
kubectl get pod -l <selector> -w

# Check timestamps in application logs
kubectl logs <pod-name> | head -5  # First log entry
kubectl logs <pod-name> | grep -i "started\|ready\|listening"  # Startup complete

The time between the first log entry and the "started" message is the actual startup time.

5. Increase the Startup Probe Budget

If the application legitimately needs more time, increase the probe's total budget.

kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "<container-name>",
          "startupProbe": {
            "httpGet": {
              "path": "/healthz",
              "port": 8080
            },
            "periodSeconds": 10,
            "failureThreshold": 60,
            "timeoutSeconds": 5
          }
        }]
      }
    }
  }
}'

This gives the application 600 seconds (10 minutes) to start. Set the budget to at least 2x the observed maximum startup time to account for variability (cold caches, slow storage, heavy load).

6. Use a TCP Probe Instead of HTTP

If the HTTP endpoint is not available until late in startup, switch to a TCP probe that just checks if the port is open.

kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "containers": [{
          "name": "<container-name>",
          "startupProbe": {
            "tcpSocket": {
              "port": 8080
            },
            "periodSeconds": 5,
            "failureThreshold": 60
          }
        }]
      }
    }
  }
}'

TCP probes succeed as soon as the port is listening, which usually happens earlier in the startup process than when an HTTP health endpoint is fully functional.

7. Check Resource Availability During Startup

Startup often requires more resources than steady state (JVM class loading, cache warming, data loading).

# Check current resource limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}' | jq .

# Check if the container is being CPU-throttled
kubectl top pod <pod-name> --containers

If the container is at its CPU limit during startup, initialization takes longer. Consider temporarily higher resource limits or using burstable QoS by setting requests lower than limits.

kubectl set resources deployment <deployment-name> \
  --requests=cpu=250m,memory=512Mi \
  --limits=cpu=2000m,memory=2Gi

8. Fix Application Startup Failures

If the logs show the application fails during startup (not just slow):

# Check for missing environment variables
kubectl exec <pod-name> -- env | sort

# Check for missing config files
kubectl exec <pod-name> -- ls -la /etc/config/

# Check for dependency connectivity
kubectl exec <pod-name> -- nc -zv <dependency-host> <port>

# Check for volume mount issues
kubectl exec <pod-name> -- ls -la /data/

Fix the underlying startup failure (missing config, unreachable dependency, etc.), then the startup probe will pass.

9. Consider Using an Exec Probe

For applications with complex startup requirements, an exec probe can run a custom script that checks multiple conditions.

startupProbe:
  exec:
    command:
    - /bin/sh
    - -c
    - |
      if [ -f /tmp/app-ready ]; then
        exit 0
      else
        exit 1
      fi
  periodSeconds: 5
  failureThreshold: 120

The application writes /tmp/app-ready when it finishes initialization. This allows precise control over when the startup probe succeeds.

10. Verify Startup Probe Passes

After adjusting the probe or fixing the application, verify the pod starts successfully.

# Watch the pod
kubectl get pod -l <selector> -w

# Verify the container is running and not restarting
kubectl get pod <pod-name> -o custom-columns=NAME:.metadata.name,READY:.status.containerStatuses[0].ready,RESTARTS:.status.containerStatuses[0].restartCount,STARTED:.status.containerStatuses[0].started

# Check that the startup probe succeeded (started will be true)
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].started}'

# Confirm liveness and readiness probes are now active
kubectl describe pod <pod-name> | grep -E "Liveness|Readiness|Startup"

The startup probe has succeeded when the container's started field is true, the restart count stabilizes, and the pod transitions to Ready. Once the startup probe passes, it never runs again for that container — the liveness and readiness probes take over from that point.

How to Explain This in an Interview

I would explain that startup probes were introduced in Kubernetes 1.16 (GA in 1.20) to solve a fundamental problem: how to handle slow-starting applications without making liveness probes too lenient. Before startup probes, operators had to set high initialDelaySeconds on liveness probes, which meant that an application crash after startup would not be detected for a long time. With startup probes, the startup probe can have a generous timeout for initialization, and once it succeeds, the liveness probe takes over with tighter timings. I'd discuss how to calculate the right failureThreshold and periodSeconds (total startup budget = failureThreshold x periodSeconds), and how to choose between HTTP, TCP, and exec probes for startup checking.

Prevention

Calculate startup probe budget based on actual maximum startup time plus buffer
Monitor application startup times and adjust probes when they change
Use a lightweight TCP probe for startup instead of HTTP if the endpoint is not available early
Ensure containers have sufficient resources for the initialization phase
Log startup progress so failures can be diagnosed from logs