How Do Liveness Probes Work in Kubernetes?

beginner|podsdevopssreCKACKAD
TL;DR

A liveness probe tells the kubelet whether a container is still running correctly. If the probe fails consecutively beyond the failure threshold, the kubelet kills the container and restarts it according to the Pod's restart policy.

Detailed Answer

A liveness probe is a diagnostic check that the kubelet performs periodically on a container to determine whether it is still healthy. If the probe fails a configured number of consecutive times, the kubelet kills the container and restarts it.

Why Liveness Probes Exist

Some applications enter a broken state where the process is still running but cannot serve requests -- for example, a deadlocked thread pool or a corrupted in-memory cache. Without a liveness probe, Kubernetes has no way to detect this condition because the container's process ID is still active.

Probe Mechanisms

Kubernetes supports three probe types:

HTTP GET Probe

The kubelet sends an HTTP GET request to a specified path and port. Any response code between 200 and 399 is considered healthy.

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
    - name: app
      image: myapp/server:2.1
      ports:
        - containerPort: 8080
      livenessProbe:
        httpGet:
          path: /healthz
          port: 8080
        initialDelaySeconds: 15
        periodSeconds: 10
        timeoutSeconds: 3
        failureThreshold: 3
        successThreshold: 1
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"

TCP Socket Probe

The kubelet attempts to open a TCP connection to the specified port. If the connection succeeds, the container is healthy.

livenessProbe:
  tcpSocket:
    port: 3306
  initialDelaySeconds: 30
  periodSeconds: 10

This is useful for databases or services that do not expose HTTP endpoints.

Exec Probe

The kubelet runs a command inside the container. If the command returns exit code 0, the container is healthy.

livenessProbe:
  exec:
    command:
      - cat
      - /tmp/healthy
  initialDelaySeconds: 5
  periodSeconds: 5

Configuration Parameters

| Parameter | Default | Description | |-----------|---------|-------------| | initialDelaySeconds | 0 | Seconds to wait after container start before the first probe | | periodSeconds | 10 | How often (in seconds) to perform the probe | | timeoutSeconds | 1 | Seconds before the probe times out | | failureThreshold | 3 | Consecutive failures before the container is killed | | successThreshold | 1 | Consecutive successes to mark the container as healthy (must be 1 for liveness) |

What Happens When a Liveness Probe Fails

  1. The kubelet marks the probe as failed.
  2. After failureThreshold consecutive failures, the kubelet kills the container.
  3. The container is restarted according to the Pod's restartPolicy (usually Always for Deployment-managed Pods).
  4. The restart count increments, visible via kubectl get pods in the RESTARTS column.
  5. If the container keeps failing, Kubernetes applies exponential backoff (CrashLoopBackOff), delaying restarts up to 5 minutes.
# Check liveness probe status and restart count
kubectl describe pod web-app
kubectl get pod web-app -o jsonpath='{.status.containerStatuses[0].restartCount}'

Common Mistakes

Checking External Dependencies

Never check a database or downstream service in your liveness probe. If the database goes down, all your Pods will be restarted simultaneously, causing a cascading outage and making recovery harder.

# BAD -- checks external dependency
livenessProbe:
  httpGet:
    path: /healthz?check=database
    port: 8080

# GOOD -- checks only local process health
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080

Missing initialDelaySeconds Without a Startup Probe

If your application takes 60 seconds to start and the liveness probe begins checking at second 0, the container will be killed before it is ready. Either set initialDelaySeconds appropriately or use a startup probe (the preferred approach).

Probe Timeout Too Short

Setting timeoutSeconds: 1 on an endpoint that occasionally takes 2 seconds under load causes unnecessary restarts. Test your health endpoint's latency under realistic conditions.

Liveness Probes and gRPC

Starting with Kubernetes 1.27 (stable), you can use native gRPC health probes:

livenessProbe:
  grpc:
    port: 50051
    service: "myapp.health.v1.Health"
  initialDelaySeconds: 10
  periodSeconds: 10

This requires your application to implement the gRPC Health Checking Protocol.

Best Practices

  1. Keep the health endpoint lightweight -- it should return quickly and not trigger expensive operations.
  2. Use startup probes for slow-starting apps instead of large initialDelaySeconds values.
  3. Only check local state in liveness probes -- thread pool health, memory corruption flags, internal deadlock detection.
  4. Set reasonable timeouts -- at least 2-3x your endpoint's p99 latency.
  5. Monitor restart counts -- frequent restarts indicate a misconfigured probe or an underlying application bug.

Why Interviewers Ask This

Interviewers ask this to evaluate your understanding of self-healing in Kubernetes. Misconfigured liveness probes are a common source of production incidents, so knowing the nuances matters.

Common Follow-Up Questions

What's the difference between a liveness probe and a readiness probe?
A liveness probe restarts a container when it fails. A readiness probe removes the Pod from Service endpoints without restarting it. They serve different purposes.
What are the three types of probe mechanisms?
HTTP GET (checks an HTTP endpoint), TCP socket (checks if a port is open), and exec (runs a command inside the container and checks the exit code).
Can a liveness probe cause cascading failures?
Yes. If the probe is too aggressive or checks an external dependency, all Pods can restart simultaneously during a transient issue, causing a cascading outage.

Key Takeaways

  • Liveness probes detect when a container is deadlocked or broken and trigger automatic restarts.
  • Always set initialDelaySeconds or use a startup probe to avoid killing slow-starting containers.
  • Never check external dependencies in a liveness probe -- only check the container's own health.

Related Questions