How Do You Achieve Zero-Downtime Deployments in Kubernetes?

Q: How Do You Achieve Zero-Downtime Deployments in Kubernetes?

Zero-downtime deployments require a combination of rolling updates with maxUnavailable: 0, readiness probes, graceful shutdown handling, PodDisruptionBudgets, and preStop hooks. No single setting achieves it -- you need all layers working together.

Detailed Answer

Achieving true zero-downtime deployments in Kubernetes is more nuanced than setting strategy: RollingUpdate. It requires careful configuration at multiple layers: the Deployment strategy, health probes, graceful shutdown, network propagation, and disruption budgets.

The Five Pillars of Zero Downtime

Rolling update with maxUnavailable: 0
Readiness probes
Graceful shutdown with preStop hooks
PodDisruptionBudgets
Application-level connection draining

Pillar 1 -- Rolling Update Strategy

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0

With maxUnavailable: 0, Kubernetes never terminates an old Pod until a new one is fully Ready. This guarantees the total number of available Pods never drops below the desired count.

Pillar 2 -- Readiness Probes

readinessProbe:
  httpGet:
    path: /ready
    port: 8080
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3
  successThreshold: 1

Without a readiness probe, Kubernetes considers a Pod ready the moment its containers start. Traffic will hit an application that is still initializing, causing errors.

Pillar 3 -- Graceful Shutdown

This is where most teams miss a critical detail. When Kubernetes terminates a Pod, two things happen in parallel:

The Pod is removed from Service endpoints.
The Pod receives SIGTERM.

The problem is that kube-proxy and ingress controllers may take a few seconds to update their routing rules. During this window, traffic can still be sent to the terminating Pod.

The solution is a preStop hook that introduces a short delay:

spec:
  terminationGracePeriodSeconds: 60
  containers:
    - name: web-app
      image: web-app:2.0
      ports:
        - containerPort: 8080
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 10"]

The timeline becomes:

t=0s   SIGTERM sent + Pod removed from endpoints (parallel)
t=0-10s  preStop sleep -- app still running, kube-proxy updates propagate
t=10s  preStop finishes, app receives SIGTERM, begins graceful shutdown
t=10-60s  App drains connections and exits
t=60s  SIGKILL if still running

The 10-second sleep ensures routing rules are updated before the application starts shutting down.

Pillar 4 -- PodDisruptionBudgets

Rolling updates are managed by the Deployment controller. But other operations can disrupt Pods too: node drains, cluster autoscaler, kubectl delete pod.

A PodDisruptionBudget (PDB) prevents too many Pods from being disrupted simultaneously:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2              # Or use maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

With minAvailable: 2 and 3 replicas, at most 1 Pod can be voluntarily disrupted at a time. This protects against:

kubectl drain during node maintenance
Cluster autoscaler removing nodes
Spot/preemptible instance termination

Pillar 5 -- Application-Level Connection Draining

Your application must handle SIGTERM gracefully:

# Python example with graceful shutdown
import signal
import sys
from http.server import HTTPServer

server = HTTPServer(('0.0.0.0', 8080), MyHandler)

def graceful_shutdown(signum, frame):
    print("Received SIGTERM, draining connections...")
    server.shutdown()  # Stops accepting new connections, finishes existing ones
    sys.exit(0)

signal.signal(signal.SIGTERM, graceful_shutdown)
server.serve_forever()

The key behaviors:

Stop accepting new connections after receiving SIGTERM.
Finish processing in-flight requests within the grace period.
Exit cleanly so Kubernetes does not need to send SIGKILL.

The Complete Zero-Downtime Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  revisionHistoryLimit: 5
  progressDeadlineSeconds: 300
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: web-app
          image: web-app:2.0
          ports:
            - containerPort: 8080
          resources:
            requests:
              cpu: 100m
              memory: 128Mi
            limits:
              cpu: 500m
              memory: 256Mi
          startupProbe:
            httpGet:
              path: /healthz
              port: 8080
            failureThreshold: 30
            periodSeconds: 10
          readinessProbe:
            httpGet:
              path: /ready
              port: 8080
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            periodSeconds: 10
            failureThreshold: 5
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

The Endpoint Propagation Race Condition

This is the most commonly overlooked issue. Here is the detailed sequence when a Pod is terminated:

1. API server marks Pod for deletion
2. kubelet receives watch event → sends SIGTERM + runs preStop hook
3. endpoints controller receives watch event → removes Pod from Endpoints
4. kube-proxy receives Endpoints update → updates iptables/ipvs rules
5. Ingress controller receives Endpoints update → updates upstream list

Steps 2-5 happen concurrently and asynchronously. Without the preStop sleep, the application might shut down before kube-proxy finishes updating its rules. This causes connection refused errors or 502s for a brief window.

The sleep in the preStop hook gives all components time to propagate the endpoint removal.

Testing Zero Downtime

Use a load testing tool during deployment to verify:

# Terminal 1: Generate continuous traffic
while true; do
  curl -s -o /dev/null -w "%{http_code}\n" http://web-app.default.svc.cluster.local/
done | sort | uniq -c

# Terminal 2: Trigger a deployment update
kubectl set image deployment/web-app web-app=web-app:2.1

If you see any non-200 responses during the rollout, one of the five pillars is misconfigured.

Additional Considerations

minReadySeconds: Adds a delay after a Pod is Ready before it counts as Available. Useful as an additional safety buffer:

spec:
  minReadySeconds: 10

Topology spread constraints: Ensure Pods are spread across nodes and zones so a single node failure does not take down the service:

topologySpreadConstraints:
  - maxSkew: 1
    topologyKey: topology.kubernetes.io/zone
    whenUnsatisfiable: DoNotSchedule
    labelSelector:
      matchLabels:
        app: web-app

Summary

True zero-downtime deployments require all five pillars working together: a rolling update with maxUnavailable: 0, readiness probes to gate traffic, preStop hooks to handle the endpoint propagation race condition, PodDisruptionBudgets to protect against voluntary disruptions, and application-level graceful shutdown. Missing any one of these layers can cause brief outages during otherwise well-configured deployments.