What Is TTL-After-Finished for Jobs?

Q: What Is TTL-After-Finished for Jobs?

The ttlSecondsAfterFinished field automatically deletes a Job and its Pods after a specified number of seconds once the Job completes (either successfully or with failure). This prevents completed Jobs from accumulating and consuming etcd storage.

Detailed Answer

By default, completed Kubernetes Jobs and their Pods remain in the cluster indefinitely. This is useful for debugging — you can inspect logs and exit codes — but it creates a maintenance burden. The ttlSecondsAfterFinished field solves this by automatically cleaning up finished Jobs.

Basic Usage

apiVersion: batch/v1
kind: Job
metadata:
  name: report-generator
spec:
  ttlSecondsAfterFinished: 3600    # Delete 1 hour after completion
  template:
    spec:
      containers:
        - name: generator
          image: myapp/report-gen:v2
          command: ["python", "generate_report.py"]
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "2Gi"
      restartPolicy: Never
  backoffLimit: 3

After the Job finishes (successfully or not), Kubernetes waits 3600 seconds, then deletes:

The Job object
All Pods created by the Job (both completed and failed)

TTL Values for Common Scenarios

| Scenario | TTL Value | Reasoning | |---|---|---| | High-volume batch processing | 0 | Immediate cleanup to prevent accumulation | | Standard batch Jobs | 300 (5 min) | Brief window for log inspection | | Important Jobs needing debugging | 3600 (1 hour) | Enough time for engineers to investigate | | Critical Jobs | 86400 (24 hours) | Full day for post-mortem analysis | | No automatic cleanup | Omit the field | Manual cleanup required |

Immediate Cleanup

Setting ttlSecondsAfterFinished: 0 is ideal for high-throughput batch systems:

apiVersion: batch/v1
kind: Job
metadata:
  name: quick-task
spec:
  ttlSecondsAfterFinished: 0
  template:
    spec:
      containers:
        - name: task
          image: myapp/task:v1
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
      restartPolicy: Never

Note that "immediate" is not truly instant — the TTL controller runs periodically and there is a small delay between Job completion and deletion.

Using TTL with CronJobs

Apply TTL to CronJob-generated Jobs via the jobTemplate:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hourly-cleanup
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 600    # Clean up 10 min after each run
      template:
        spec:
          containers:
            - name: cleanup
              image: myapp/cleanup:v1
              resources:
                requests:
                  cpu: "250m"
                  memory: "256Mi"
          restartPolicy: OnFailure
      backoffLimit: 2
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1

Both ttlSecondsAfterFinished and CronJob history limits work independently. The TTL timer starts when the Job completes, while history limits are checked when new Jobs are created.

Why Cleanup Matters

Without TTL cleanup, completed Jobs accumulate and cause real problems:

etcd storage: Each Job object consumes space in etcd. Thousands of completed Jobs can significantly increase etcd database size.
API performance: Listing Jobs with kubectl get jobs becomes slow as the count grows.
Monitoring noise: Completed Jobs appear in dashboards and alerts.
Pod objects: Completed Pods from Jobs with restartPolicy: Never remain visible, cluttering kubectl get pods output.

# See how many completed Jobs exist
kubectl get jobs --field-selector status.successful=1 --no-headers | wc -l

# Manual bulk cleanup (if TTL is not set)
kubectl delete jobs --field-selector status.successful=1

# Or use a label selector for specific CronJob-generated Jobs
kubectl delete jobs -l app=hourly-cleanup

Adding TTL to Existing Jobs

You can add ttlSecondsAfterFinished to an already-completed Job:

kubectl patch job report-generator -p '{"spec":{"ttlSecondsAfterFinished":60}}'

The TTL timer starts from the Job's completion time, not from when the field was added. If the Job completed more than 60 seconds ago, it is cleaned up on the next TTL controller reconciliation.

Limitations

TTL cleanup does not cascade to external resources (PVCs, ConfigMaps, Secrets created by the Job)
The TTL controller runs periodically, so cleanup is not guaranteed to be instantaneous
Once deleted, Job logs are lost — ensure logs are shipped to an external system before TTL expires

Detailed Answer

Basic Usage

TTL Values for Common Scenarios

Immediate Cleanup

Using TTL with CronJobs

Why Cleanup Matters

Adding TTL to Existing Jobs

Limitations

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like