What Are Startup Probes and When Should You Use Them?

Q: What Are Startup Probes and When Should You Use Them?

A startup probe tells the kubelet whether a container's application has finished starting. While the startup probe is active, liveness and readiness probes are disabled. This prevents slow-starting containers from being killed by aggressive liveness probes before they are initialized.

Detailed Answer

A startup probe is a Kubernetes probe mechanism that protects slow-starting containers from being terminated by liveness probes before they are fully initialized. It was introduced as stable in Kubernetes 1.20 to solve a longstanding problem with the interaction between application initialization and health checking.

The Problem Startup Probes Solve

Consider a Java application that takes 90 seconds to start up. If you configure a liveness probe with initialDelaySeconds: 10 and failureThreshold: 3 with periodSeconds: 10, the container will be killed after 40 seconds (10 + 3*10), long before it finishes starting.

Before startup probes existed, the workaround was to set a very large initialDelaySeconds on the liveness probe. But this created a blind spot: if the container deadlocked after startup, Kubernetes would not detect it for a long time.

Startup probes solve this cleanly by separating the "is it done starting?" check from the "is it still alive?" check.

How Startup Probes Work

When a container starts, the kubelet begins running the startup probe.
Liveness and readiness probes are disabled while the startup probe is running.
If the startup probe succeeds, it is permanently disabled and liveness/readiness probes activate.
If the startup probe fails beyond the failureThreshold, the container is killed.

Configuration Example

apiVersion: v1
kind: Pod
metadata:
  name: java-app
spec:
  containers:
    - name: app
      image: myapp/java-server:3.0
      ports:
        - containerPort: 8080
      startupProbe:
        httpGet:
          path: /healthz
          port: 8080
        failureThreshold: 30
        periodSeconds: 10
        # Total startup budget: 30 * 10 = 300 seconds (5 minutes)
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        periodSeconds: 10
        failureThreshold: 3
      readinessProbe:
        httpGet:
          path: /ready
          port: 8080
        periodSeconds: 5
        failureThreshold: 3
      resources:
        requests:
          cpu: "500m"
          memory: "512Mi"
        limits:
          cpu: "1"
          memory: "1Gi"

In this configuration:

The startup probe gives the application up to 300 seconds (30 failures * 10 seconds) to start.
Once /healthz returns a success code, the startup probe stops and the liveness probe takes over with a much tighter 30-second detection window (3 failures * 10 seconds).
The readiness probe begins simultaneously, controlling when the Pod receives traffic.

Calculating the Startup Budget

The maximum time a container has to start is:

failureThreshold * periodSeconds

Set this to the worst-case startup time for your application, plus a safety margin. Common values:

| Application Type | Typical Startup | Suggested Budget | |-----------------|----------------|-----------------| | Go / Rust microservice | 1-5s | 30s | | Node.js / Python service | 5-15s | 60s | | Java / Spring Boot | 30-120s | 300s | | ML model loading | 60-600s | 900s |

All Three Probes Working Together

Here is the timeline of how probes interact:

Container starts
    |
    v
[Startup Probe runs]  -- liveness and readiness are DISABLED
    |
    | (startup probe succeeds)
    v
[Liveness Probe activates]  -- detects deadlocks, restarts container
[Readiness Probe activates] -- gates Service traffic
    |
    v
(Pod serves traffic until termination)

Debugging Startup Probe Failures

When a container is stuck in a restart loop due to startup probe failures:

# Check events for probe failure messages
kubectl describe pod java-app

# Check container logs to see what's happening during startup
kubectl logs java-app -c app --previous

# Look for the specific reason
kubectl get pod java-app -o jsonpath='{.status.containerStatuses[0].state}'

Common causes of startup probe failures:

Startup takes longer than the budget: Increase failureThreshold or periodSeconds.
Wrong port or path: Verify the health endpoint is correct.
Missing dependencies: The app may be waiting for a database or config that is not available.
Insufficient resources: The container may be OOM-killed or CPU-throttled during startup.

Startup Probes vs. Init Containers

These are complementary, not competing, features:

| Feature | Purpose | Timing | |---------|---------|--------| | Init containers | Run prerequisite tasks (migrations, config fetch) | Before app container starts | | Startup probes | Wait for the app container's process to initialize | After app container starts |

A typical flow might be: init container fetches config from Vault, then the app container starts, then the startup probe waits for the JVM to warm up.

Best Practices

Always use startup probes for slow-starting applications instead of large initialDelaySeconds values on liveness probes.
Set a generous failureThreshold -- it is better to wait a little longer than to kill a container that is still starting.
Use the same endpoint for startup and liveness probes (/healthz) since they answer the same question: "is the process functional?"
Use a different endpoint for readiness (/ready) since readiness often has additional dependency checks.
Monitor startup duration using metrics or events to detect regressions in application initialization time.