How Does Graceful Shutdown Work in Kubernetes?

intermediate|podsdevopssrebackend developerCKACKAD
TL;DR

Graceful shutdown in Kubernetes is the process of terminating a Pod without dropping in-flight requests. It involves PreStop hooks, SIGTERM signal handling, terminationGracePeriodSeconds, and coordinating with Service endpoint removal.

Detailed Answer

Graceful shutdown is the process of terminating a Pod in a way that allows it to finish processing in-flight work without dropping requests. Getting this right is essential for zero-downtime deployments, autoscaling events, and node maintenance.

The Termination Sequence

When Kubernetes decides to terminate a Pod (rolling update, scale-down, kubectl delete, node drain), the following steps occur:

  1. Pod status set to Terminating — the API server updates the Pod's metadata
  2. Endpoint removal begins — the Endpoints controller removes the Pod from Service endpoints, and kube-proxy starts updating iptables/IPVS rules on all nodes
  3. PreStop hook fires — runs inside the container (if defined)
  4. SIGTERM sent — after the PreStop hook completes, kubelet sends SIGTERM to PID 1
  5. Grace period countdown — the terminationGracePeriodSeconds timer started at step 1
  6. SIGKILL sent — if the container is still running when the grace period expires

Steps 2 and 3 happen in parallel, which creates a critical race condition.

The Race Condition Problem

Timeline:
0s    Pod marked Terminating
      ├── kube-proxy starts removing endpoints (takes 1-10+ seconds)
      └── PreStop hook starts (if defined)
                                    ├── SIGTERM sent
                                    ├── App starts draining
                                    └── App exits

Problem: If app exits before all kube-proxy instances update,
         some nodes still route traffic to the dead Pod → 502 errors

The Solution: PreStop Sleep

A PreStop sleep gives kube-proxy enough time to propagate endpoint removal across all nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: myapi:2.0
          ports:
            - containerPort: 8080
          lifecycle:
            preStop:
              sleep:
                seconds: 10
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

The 10-second sleep ensures kube-proxy updates are complete before SIGTERM reaches the application. The total grace period (60s) must be large enough to cover the sleep plus the application's drain time.

Application-Side SIGTERM Handling

Your application must handle SIGTERM properly. Here is the general pattern:

# Python example
import signal
import sys

def handle_sigterm(signum, frame):
    print("SIGTERM received, starting graceful shutdown")
    # 1. Stop accepting new connections
    server.stop_accepting()
    # 2. Wait for in-flight requests to complete
    server.drain(timeout=30)
    # 3. Close database connections
    db.close()
    # 4. Exit cleanly
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

Common Frameworks and SIGTERM

| Framework | Default Behavior | Notes | |-----------|-----------------|-------| | Go net/http | Does NOT handle SIGTERM | Use http.Server.Shutdown() | | Node.js | Process exits immediately | Register process.on('SIGTERM', ...) | | Spring Boot | Graceful shutdown available | Set server.shutdown=graceful | | Nginx | Stops accepting, drains | nginx -s quit handles it well |

Calculating terminationGracePeriodSeconds

terminationGracePeriodSeconds = preStop sleep
                              + max application drain time
                              + safety buffer

Example: 10s sleep + 30s drain + 5s buffer = 45s

Always set this value explicitly rather than relying on the 30-second default.

PodDisruptionBudgets and Graceful Shutdown

For voluntary disruptions (node drain, cluster upgrades), PodDisruptionBudgets (PDBs) control how many Pods can be down simultaneously:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

This ensures at least 2 replicas remain available during disruptions, giving each Pod time to shut down gracefully before the next one is terminated.

Debugging Shutdown Issues

# Watch termination in real time
kubectl delete pod api-xyz --grace-period=60 &
kubectl get pod api-xyz -w

# Check if the app is handling SIGTERM
kubectl logs api-xyz --previous

# Test locally with Docker
docker stop --time 30 <container-id>

Checklist for Production-Ready Graceful Shutdown

  1. Application handles SIGTERM and drains in-flight requests
  2. PreStop hook with 5-15 second sleep to handle endpoint propagation race
  3. terminationGracePeriodSeconds set to cover PreStop + drain + buffer
  4. PodDisruptionBudget configured to prevent simultaneous termination
  5. Health checks (readiness probe) return failure during drain to stop new traffic
  6. Connection pools and database handles are closed before exit

Why Interviewers Ask This

Interviewers ask this to gauge your understanding of zero-downtime operations. Production incidents often stem from applications that do not handle termination signals correctly.

Common Follow-Up Questions

What is the default terminationGracePeriodSeconds?
30 seconds. After this period, Kubernetes sends SIGKILL to forcefully terminate the container.
Why do some applications still receive traffic after starting to shut down?
Endpoint removal and PreStop hooks happen concurrently. There is a race window where kube-proxy has not yet removed the Pod's IP from iptables rules while the Pod is already terminating.
How should a stateful application handle SIGTERM?
It should stop accepting new work, finish processing in-flight requests, flush data to disk or a queue, close database connections, and then exit with code 0.

Key Takeaways

  • Kubernetes sends SIGTERM first, then SIGKILL after the grace period — your app must handle SIGTERM.
  • A PreStop sleep of 5-15 seconds mitigates the race between endpoint removal and container termination.
  • Set terminationGracePeriodSeconds based on your application's drain time, not just the default 30 seconds.

Related Questions

You Might Also Like