What causes DeadlineExceeded in Kubernetes?

Kubernetes DeadlineExceeded

Causes and Fixes

DeadlineExceeded means a Kubernetes resource exceeded its configured time limit. This most commonly applies to Jobs that exceed their activeDeadlineSeconds, Deployments that exceed their progressDeadlineSeconds during a rollout, or pods that exceed their activeDeadlineSeconds.

Symptoms

Job status shows 'DeadlineExceeded' condition
Deployment shows 'ProgressDeadlineExceeded' in conditions
kubectl describe job shows 'Job was active longer than specified deadline'
Pods are terminated when the deadline is reached
Rollout appears stuck and eventually reports failure

Common Causes

Job runs longer than activeDeadlineSeconds

The Job's workload takes longer than the configured deadline. Increase activeDeadlineSeconds or optimize the workload.

Deployment rollout stalls

New pods fail to become ready within progressDeadlineSeconds (default 600s). The rollout is marked as failed but is not automatically rolled back.

Insufficient cluster resources for rollout

New pods cannot be scheduled during a rolling update because the cluster lacks capacity to run both old and new pods simultaneously.

Image pull takes too long

Large images on slow networks can cause pods to exceed the progress deadline before they start.

Readiness probe never passes

New pods start but never become ready, so the rollout never progresses. Check readiness probe configuration and application health.

Step-by-Step Troubleshooting

1. Determine the Context

DeadlineExceeded applies to different resources. Identify which one is affected.

# Check Jobs
kubectl get jobs -n <namespace>
kubectl describe job <job-name> -n <namespace>

# Check Deployments
kubectl rollout status deployment/<deploy-name> -n <namespace>
kubectl describe deployment <deploy-name> -n <namespace>

2. Troubleshoot Job DeadlineExceeded

For Jobs, the activeDeadlineSeconds sets a hard time limit on the entire Job.

# Check the Job's deadline configuration
kubectl get job <job-name> -o jsonpath='{.spec.activeDeadlineSeconds}'

# Check how long the Job ran
kubectl describe job <job-name> | grep -E "Start Time|Completion Time|Active Deadline"

# Check Job conditions
kubectl get job <job-name> -o jsonpath='{.status.conditions}' | jq .

The condition will show:

{
  "type": "Failed",
  "status": "True",
  "reason": "DeadlineExceeded",
  "message": "Job was active longer than specified deadline"
}

Fix: Increase the deadline or optimize the workload.

apiVersion: batch/v1
kind: Job
metadata:
  name: long-running-job
spec:
  activeDeadlineSeconds: 7200  # 2 hours instead of default
  template:
    spec:
      containers:
        - name: worker
          image: myapp:v1
          command: ["./process.sh"]
      restartPolicy: OnFailure

3. Troubleshoot Deployment ProgressDeadlineExceeded

For Deployments, progressDeadlineSeconds (default: 600s) controls how long Kubernetes waits for a rollout to make progress.

# Check rollout status
kubectl rollout status deployment/<deploy-name>

# Check deployment conditions
kubectl get deployment <deploy-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing")'

# Check the progress deadline
kubectl get deployment <deploy-name> -o jsonpath='{.spec.progressDeadlineSeconds}'

The condition will show:

{
  "type": "Progressing",
  "status": "False",
  "reason": "ProgressDeadlineExceeded",
  "message": "ReplicaSet \"app-5b9f\" has timed out progressing."
}

4. Investigate Why the Rollout Stalled

Check the new ReplicaSet's pods to understand why they are not becoming ready.

# Find the new ReplicaSet
kubectl get rs -n <namespace> --sort-by='.metadata.creationTimestamp' | tail -3

# Check pods in the new ReplicaSet
kubectl get pods -n <namespace> -l app=<app-label> --sort-by='.metadata.creationTimestamp'

# Describe a pod from the new ReplicaSet
kubectl describe pod <new-pod-name>

Common reasons the rollout stalls:

Pods stuck in Pending (insufficient resources)
Pods in CrashLoopBackOff (application error in new version)
Pods in ImagePullBackOff (wrong image reference)
Pods running but readiness probe failing (application not healthy)

5. Check Readiness Probes

If pods are running but not ready, the readiness probe is failing.

# Check readiness probe config
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

# Check if the pod is ready
kubectl get pod <pod-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Ready")'

# Test the readiness endpoint manually
kubectl exec <pod-name> -- wget -q -O - http://localhost:8080/readyz

6. Roll Back the Deployment

If the new version is broken, roll back.

# Roll back to the previous version
kubectl rollout undo deployment/<deploy-name>

# Roll back to a specific revision
kubectl rollout history deployment/<deploy-name>
kubectl rollout undo deployment/<deploy-name> --to-revision=3

# Verify the rollback
kubectl rollout status deployment/<deploy-name>

7. Adjust the Progress Deadline

If the rollout is legitimate but slow (e.g., large image pulls, slow startup), increase the deadline.

kubectl patch deployment <deploy-name> \
  -p '{"spec":{"progressDeadlineSeconds": 1200}}'

For applications with slow startups, also consider startup probes:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 60
  periodSeconds: 10

8. Clean Up Failed Jobs

Failed Jobs remain in the cluster. Clean them up to free resources.

# Delete failed jobs
kubectl delete jobs --field-selector status.successful=0 -n <namespace>

# Set TTL to auto-clean completed/failed Jobs

apiVersion: batch/v1
kind: Job
metadata:
  name: cleanup-job
spec:
  ttlSecondsAfterFinished: 3600  # Delete 1 hour after completion
  activeDeadlineSeconds: 600
  template:
    spec:
      containers:
        - name: worker
          image: myapp:v1
      restartPolicy: Never

9. Verify the Fix

# For Jobs: confirm the Job completes within the deadline
kubectl get job <job-name> -w

# For Deployments: confirm the rollout completes
kubectl rollout status deployment/<deploy-name>
kubectl get deployment <deploy-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing")'

The Job should show Complete status, or the Deployment should show Progressing: True with reason NewReplicaSetAvailable.

How to Explain This in an Interview

I would distinguish between the two main contexts: Job activeDeadlineSeconds (which terminates the Job's pods after the deadline) and Deployment progressDeadlineSeconds (which marks the rollout as failed but does not roll back automatically). For Jobs, I would discuss setting appropriate deadlines based on expected run times plus buffer. For Deployments, I would explain that the progress deadline is a monitoring mechanism — if a rollout stalls, you need to investigate why pods are not becoming ready and decide whether to roll back manually.

Prevention

Set realistic activeDeadlineSeconds for Jobs based on benchmarked run times
Monitor Deployment rollouts with progressDeadlineSeconds alerts
Use readiness probes so rollouts detect unhealthy pods
Pre-pull images on nodes to reduce startup time
Ensure cluster has enough headroom for rolling updates