How Does Graceful Shutdown Work in Kubernetes?

Q: How Does Graceful Shutdown Work in Kubernetes?

Graceful shutdown in Kubernetes is the process of terminating a Pod without dropping in-flight requests. It involves PreStop hooks, SIGTERM signal handling, terminationGracePeriodSeconds, and coordinating with Service endpoint removal.

Detailed Answer

Graceful shutdown is the process of terminating a Pod in a way that allows it to finish processing in-flight work without dropping requests. Getting this right is essential for zero-downtime deployments, autoscaling events, and node maintenance.

The Termination Sequence

When Kubernetes decides to terminate a Pod (rolling update, scale-down, kubectl delete, node drain), the following steps occur:

Pod status set to Terminating — the API server updates the Pod's metadata
Endpoint removal begins — the Endpoints controller removes the Pod from Service endpoints, and kube-proxy starts updating iptables/IPVS rules on all nodes
PreStop hook fires — runs inside the container (if defined)
SIGTERM sent — after the PreStop hook completes, kubelet sends SIGTERM to PID 1
Grace period countdown — the terminationGracePeriodSeconds timer started at step 1
SIGKILL sent — if the container is still running when the grace period expires

Steps 2 and 3 happen in parallel, which creates a critical race condition.

The Race Condition Problem

Timeline:
0s    Pod marked Terminating
      ├── kube-proxy starts removing endpoints (takes 1-10+ seconds)
      └── PreStop hook starts (if defined)
                                    ├── SIGTERM sent
                                    ├── App starts draining
                                    └── App exits

Problem: If app exits before all kube-proxy instances update,
         some nodes still route traffic to the dead Pod → 502 errors

The Solution: PreStop Sleep

A PreStop sleep gives kube-proxy enough time to propagate endpoint removal across all nodes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          image: myapi:2.0
          ports:
            - containerPort: 8080
          lifecycle:
            preStop:
              sleep:
                seconds: 10
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
            limits:
              cpu: "500m"
              memory: "512Mi"

The 10-second sleep ensures kube-proxy updates are complete before SIGTERM reaches the application. The total grace period (60s) must be large enough to cover the sleep plus the application's drain time.

Application-Side SIGTERM Handling

Your application must handle SIGTERM properly. Here is the general pattern:

# Python example
import signal
import sys

def handle_sigterm(signum, frame):
    print("SIGTERM received, starting graceful shutdown")
    # 1. Stop accepting new connections
    server.stop_accepting()
    # 2. Wait for in-flight requests to complete
    server.drain(timeout=30)
    # 3. Close database connections
    db.close()
    # 4. Exit cleanly
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

Common Frameworks and SIGTERM

| Framework | Default Behavior | Notes | |-----------|-----------------|-------| | Go net/http | Does NOT handle SIGTERM | Use http.Server.Shutdown() | | Node.js | Process exits immediately | Register process.on('SIGTERM', ...) | | Spring Boot | Graceful shutdown available | Set server.shutdown=graceful | | Nginx | Stops accepting, drains | nginx -s quit handles it well |

Calculating terminationGracePeriodSeconds

terminationGracePeriodSeconds = preStop sleep
                              + max application drain time
                              + safety buffer

Example: 10s sleep + 30s drain + 5s buffer = 45s

Always set this value explicitly rather than relying on the 30-second default.

PodDisruptionBudgets and Graceful Shutdown

For voluntary disruptions (node drain, cluster upgrades), PodDisruptionBudgets (PDBs) control how many Pods can be down simultaneously:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: api

This ensures at least 2 replicas remain available during disruptions, giving each Pod time to shut down gracefully before the next one is terminated.

Debugging Shutdown Issues

# Watch termination in real time
kubectl delete pod api-xyz --grace-period=60 &
kubectl get pod api-xyz -w

# Check if the app is handling SIGTERM
kubectl logs api-xyz --previous

# Test locally with Docker
docker stop --time 30 <container-id>

Checklist for Production-Ready Graceful Shutdown

Application handles SIGTERM and drains in-flight requests
PreStop hook with 5-15 second sleep to handle endpoint propagation race
terminationGracePeriodSeconds set to cover PreStop + drain + buffer
PodDisruptionBudget configured to prevent simultaneous termination
Health checks (readiness probe) return failure during drain to stop new traffic
Connection pools and database handles are closed before exit

Detailed Answer

The Termination Sequence

The Race Condition Problem

The Solution: PreStop Sleep

Application-Side SIGTERM Handling

Common Frameworks and SIGTERM

Calculating terminationGracePeriodSeconds

PodDisruptionBudgets and Graceful Shutdown

Debugging Shutdown Issues

Checklist for Production-Ready Graceful Shutdown

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like