What Is TTL-After-Finished for Jobs?

intermediate|jobsdevopssrebackend developerCKACKAD
TL;DR

The ttlSecondsAfterFinished field automatically deletes a Job and its Pods after a specified number of seconds once the Job completes (either successfully or with failure). This prevents completed Jobs from accumulating and consuming etcd storage.

Detailed Answer

By default, completed Kubernetes Jobs and their Pods remain in the cluster indefinitely. This is useful for debugging — you can inspect logs and exit codes — but it creates a maintenance burden. The ttlSecondsAfterFinished field solves this by automatically cleaning up finished Jobs.

Basic Usage

apiVersion: batch/v1
kind: Job
metadata:
  name: report-generator
spec:
  ttlSecondsAfterFinished: 3600    # Delete 1 hour after completion
  template:
    spec:
      containers:
        - name: generator
          image: myapp/report-gen:v2
          command: ["python", "generate_report.py"]
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "2Gi"
      restartPolicy: Never
  backoffLimit: 3

After the Job finishes (successfully or not), Kubernetes waits 3600 seconds, then deletes:

  • The Job object
  • All Pods created by the Job (both completed and failed)

TTL Values for Common Scenarios

| Scenario | TTL Value | Reasoning | |---|---|---| | High-volume batch processing | 0 | Immediate cleanup to prevent accumulation | | Standard batch Jobs | 300 (5 min) | Brief window for log inspection | | Important Jobs needing debugging | 3600 (1 hour) | Enough time for engineers to investigate | | Critical Jobs | 86400 (24 hours) | Full day for post-mortem analysis | | No automatic cleanup | Omit the field | Manual cleanup required |

Immediate Cleanup

Setting ttlSecondsAfterFinished: 0 is ideal for high-throughput batch systems:

apiVersion: batch/v1
kind: Job
metadata:
  name: quick-task
spec:
  ttlSecondsAfterFinished: 0
  template:
    spec:
      containers:
        - name: task
          image: myapp/task:v1
          resources:
            requests:
              cpu: "100m"
              memory: "128Mi"
      restartPolicy: Never

Note that "immediate" is not truly instant — the TTL controller runs periodically and there is a small delay between Job completion and deletion.

Using TTL with CronJobs

Apply TTL to CronJob-generated Jobs via the jobTemplate:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: hourly-cleanup
spec:
  schedule: "0 * * * *"
  jobTemplate:
    spec:
      ttlSecondsAfterFinished: 600    # Clean up 10 min after each run
      template:
        spec:
          containers:
            - name: cleanup
              image: myapp/cleanup:v1
              resources:
                requests:
                  cpu: "250m"
                  memory: "256Mi"
          restartPolicy: OnFailure
      backoffLimit: 2
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1

Both ttlSecondsAfterFinished and CronJob history limits work independently. The TTL timer starts when the Job completes, while history limits are checked when new Jobs are created.

Why Cleanup Matters

Without TTL cleanup, completed Jobs accumulate and cause real problems:

  1. etcd storage: Each Job object consumes space in etcd. Thousands of completed Jobs can significantly increase etcd database size.
  2. API performance: Listing Jobs with kubectl get jobs becomes slow as the count grows.
  3. Monitoring noise: Completed Jobs appear in dashboards and alerts.
  4. Pod objects: Completed Pods from Jobs with restartPolicy: Never remain visible, cluttering kubectl get pods output.
# See how many completed Jobs exist
kubectl get jobs --field-selector status.successful=1 --no-headers | wc -l

# Manual bulk cleanup (if TTL is not set)
kubectl delete jobs --field-selector status.successful=1

# Or use a label selector for specific CronJob-generated Jobs
kubectl delete jobs -l app=hourly-cleanup

Adding TTL to Existing Jobs

You can add ttlSecondsAfterFinished to an already-completed Job:

kubectl patch job report-generator -p '{"spec":{"ttlSecondsAfterFinished":60}}'

The TTL timer starts from the Job's completion time, not from when the field was added. If the Job completed more than 60 seconds ago, it is cleaned up on the next TTL controller reconciliation.

Limitations

  • TTL cleanup does not cascade to external resources (PVCs, ConfigMaps, Secrets created by the Job)
  • The TTL controller runs periodically, so cleanup is not guaranteed to be instantaneous
  • Once deleted, Job logs are lost — ensure logs are shipped to an external system before TTL expires

Why Interviewers Ask This

Interviewers ask this to check whether you know how to manage Job lifecycle cleanup in production, preventing resource leaks from thousands of completed Job objects.

Common Follow-Up Questions

What happens if you set ttlSecondsAfterFinished to 0?
The Job and its Pods are eligible for deletion immediately after finishing. However, there may be a brief delay as the TTL controller processes the cleanup.
Does TTL cleanup delete the Job's PVCs?
No, TTL cleanup only deletes the Job and its Pods. Any PersistentVolumeClaims created independently must be cleaned up separately.
How does TTL cleanup interact with CronJob history limits?
CronJob uses successfulJobsHistoryLimit and failedJobsHistoryLimit for cleanup. If TTL is also set on the Job template, whichever triggers first deletes the Job.

Key Takeaways

  • ttlSecondsAfterFinished provides automatic cleanup of completed Jobs and their Pods.
  • Setting it to 0 enables immediate cleanup, useful for high-volume batch processing.
  • Without TTL cleanup, completed Jobs accumulate in etcd and must be deleted manually.

Related Questions

You Might Also Like