Kubernetes DeadlineExceeded

Causes and Fixes

DeadlineExceeded means a Kubernetes resource exceeded its configured time limit. This most commonly applies to Jobs that exceed their activeDeadlineSeconds, Deployments that exceed their progressDeadlineSeconds during a rollout, or pods that exceed their activeDeadlineSeconds.

Symptoms

  • Job status shows 'DeadlineExceeded' condition
  • Deployment shows 'ProgressDeadlineExceeded' in conditions
  • kubectl describe job shows 'Job was active longer than specified deadline'
  • Pods are terminated when the deadline is reached
  • Rollout appears stuck and eventually reports failure

Common Causes

1
Job runs longer than activeDeadlineSeconds
The Job's workload takes longer than the configured deadline. Increase activeDeadlineSeconds or optimize the workload.
2
Deployment rollout stalls
New pods fail to become ready within progressDeadlineSeconds (default 600s). The rollout is marked as failed but is not automatically rolled back.
3
Insufficient cluster resources for rollout
New pods cannot be scheduled during a rolling update because the cluster lacks capacity to run both old and new pods simultaneously.
4
Image pull takes too long
Large images on slow networks can cause pods to exceed the progress deadline before they start.
5
Readiness probe never passes
New pods start but never become ready, so the rollout never progresses. Check readiness probe configuration and application health.

Step-by-Step Troubleshooting

1. Determine the Context

DeadlineExceeded applies to different resources. Identify which one is affected.

# Check Jobs
kubectl get jobs -n <namespace>
kubectl describe job <job-name> -n <namespace>

# Check Deployments
kubectl rollout status deployment/<deploy-name> -n <namespace>
kubectl describe deployment <deploy-name> -n <namespace>

2. Troubleshoot Job DeadlineExceeded

For Jobs, the activeDeadlineSeconds sets a hard time limit on the entire Job.

# Check the Job's deadline configuration
kubectl get job <job-name> -o jsonpath='{.spec.activeDeadlineSeconds}'

# Check how long the Job ran
kubectl describe job <job-name> | grep -E "Start Time|Completion Time|Active Deadline"

# Check Job conditions
kubectl get job <job-name> -o jsonpath='{.status.conditions}' | jq .

The condition will show:

{
  "type": "Failed",
  "status": "True",
  "reason": "DeadlineExceeded",
  "message": "Job was active longer than specified deadline"
}

Fix: Increase the deadline or optimize the workload.

apiVersion: batch/v1
kind: Job
metadata:
  name: long-running-job
spec:
  activeDeadlineSeconds: 7200  # 2 hours instead of default
  template:
    spec:
      containers:
        - name: worker
          image: myapp:v1
          command: ["./process.sh"]
      restartPolicy: OnFailure

3. Troubleshoot Deployment ProgressDeadlineExceeded

For Deployments, progressDeadlineSeconds (default: 600s) controls how long Kubernetes waits for a rollout to make progress.

# Check rollout status
kubectl rollout status deployment/<deploy-name>

# Check deployment conditions
kubectl get deployment <deploy-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing")'

# Check the progress deadline
kubectl get deployment <deploy-name> -o jsonpath='{.spec.progressDeadlineSeconds}'

The condition will show:

{
  "type": "Progressing",
  "status": "False",
  "reason": "ProgressDeadlineExceeded",
  "message": "ReplicaSet \"app-5b9f\" has timed out progressing."
}

4. Investigate Why the Rollout Stalled

Check the new ReplicaSet's pods to understand why they are not becoming ready.

# Find the new ReplicaSet
kubectl get rs -n <namespace> --sort-by='.metadata.creationTimestamp' | tail -3

# Check pods in the new ReplicaSet
kubectl get pods -n <namespace> -l app=<app-label> --sort-by='.metadata.creationTimestamp'

# Describe a pod from the new ReplicaSet
kubectl describe pod <new-pod-name>

Common reasons the rollout stalls:

  • Pods stuck in Pending (insufficient resources)
  • Pods in CrashLoopBackOff (application error in new version)
  • Pods in ImagePullBackOff (wrong image reference)
  • Pods running but readiness probe failing (application not healthy)

5. Check Readiness Probes

If pods are running but not ready, the readiness probe is failing.

# Check readiness probe config
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].readinessProbe}' | jq .

# Check if the pod is ready
kubectl get pod <pod-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Ready")'

# Test the readiness endpoint manually
kubectl exec <pod-name> -- wget -q -O - http://localhost:8080/readyz

6. Roll Back the Deployment

If the new version is broken, roll back.

# Roll back to the previous version
kubectl rollout undo deployment/<deploy-name>

# Roll back to a specific revision
kubectl rollout history deployment/<deploy-name>
kubectl rollout undo deployment/<deploy-name> --to-revision=3

# Verify the rollback
kubectl rollout status deployment/<deploy-name>

7. Adjust the Progress Deadline

If the rollout is legitimate but slow (e.g., large image pulls, slow startup), increase the deadline.

kubectl patch deployment <deploy-name> \
  -p '{"spec":{"progressDeadlineSeconds": 1200}}'

For applications with slow startups, also consider startup probes:

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 60
  periodSeconds: 10

8. Clean Up Failed Jobs

Failed Jobs remain in the cluster. Clean them up to free resources.

# Delete failed jobs
kubectl delete jobs --field-selector status.successful=0 -n <namespace>

# Set TTL to auto-clean completed/failed Jobs
apiVersion: batch/v1
kind: Job
metadata:
  name: cleanup-job
spec:
  ttlSecondsAfterFinished: 3600  # Delete 1 hour after completion
  activeDeadlineSeconds: 600
  template:
    spec:
      containers:
        - name: worker
          image: myapp:v1
      restartPolicy: Never

9. Verify the Fix

# For Jobs: confirm the Job completes within the deadline
kubectl get job <job-name> -w

# For Deployments: confirm the rollout completes
kubectl rollout status deployment/<deploy-name>
kubectl get deployment <deploy-name> -o jsonpath='{.status.conditions}' | jq '.[] | select(.type=="Progressing")'

The Job should show Complete status, or the Deployment should show Progressing: True with reason NewReplicaSetAvailable.

How to Explain This in an Interview

I would distinguish between the two main contexts: Job activeDeadlineSeconds (which terminates the Job's pods after the deadline) and Deployment progressDeadlineSeconds (which marks the rollout as failed but does not roll back automatically). For Jobs, I would discuss setting appropriate deadlines based on expected run times plus buffer. For Deployments, I would explain that the progress deadline is a monitoring mechanism — if a rollout stalls, you need to investigate why pods are not becoming ready and decide whether to roll back manually.

Prevention

  • Set realistic activeDeadlineSeconds for Jobs based on benchmarked run times
  • Monitor Deployment rollouts with progressDeadlineSeconds alerts
  • Use readiness probes so rollouts detect unhealthy pods
  • Pre-pull images on nodes to reduce startup time
  • Ensure cluster has enough headroom for rolling updates

Related Errors