How Do Job Completions and Parallelism Work?

Q: How Do Job Completions and Parallelism Work?

The completions field specifies how many Pods must successfully complete for the Job to be considered done. The parallelism field controls how many Pods run concurrently. Together, they enable batch processing patterns — for example, completions: 10 with parallelism: 3 runs 10 tasks, 3 at a time.

Detailed Answer

Kubernetes Jobs support three patterns based on how you set completions and parallelism. Understanding these patterns is essential for designing efficient batch workloads.

Pattern 1: Single Pod (Default)

With no completions or parallelism specified, the Job runs a single Pod:

apiVersion: batch/v1
kind: Job
metadata:
  name: single-task
spec:
  template:
    spec:
      containers:
        - name: task
          image: myapp/task:v1
          resources:
            requests:
              cpu: "500m"
              memory: "256Mi"
      restartPolicy: Never

This is equivalent to completions: 1, parallelism: 1.

Pattern 2: Fixed Completions with Parallelism

Process N items with up to M concurrent Pods:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
spec:
  completions: 10    # 10 Pods must succeed
  parallelism: 3     # Run up to 3 Pods at a time
  template:
    spec:
      containers:
        - name: processor
          image: myapp/batch-processor:v2
          command: ["python", "process.py"]
          resources:
            requests:
              cpu: "1"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "1Gi"
      restartPolicy: Never
  backoffLimit: 5

Execution flow:

3 Pods are created simultaneously (Pods 1-3)
When Pod 1 completes, Pod 4 starts (maintaining 3 concurrent)
This continues until 10 Pods have successfully completed
Job status shows COMPLETIONS: 10/10

kubectl get job batch-processor -w
# NAME              COMPLETIONS   DURATION   AGE
# batch-processor   0/10          5s         5s
# batch-processor   1/10          12s        12s
# batch-processor   2/10          15s        15s
# ...
# batch-processor   10/10         55s        55s

Pattern 3: Work Queue (No Completions)

For work queue patterns where Pods pull from a shared queue until it is empty:

apiVersion: batch/v1
kind: Job
metadata:
  name: queue-worker
spec:
  parallelism: 5     # 5 concurrent workers
  # completions is not set
  template:
    spec:
      containers:
        - name: worker
          image: myapp/queue-worker:v1
          env:
            - name: QUEUE_URL
              value: "redis://redis:6379/0"
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
      restartPolicy: Never

Without completions, the Job creates parallelism Pods and considers itself complete when at least one Pod exits successfully. Each worker should:

Pull a task from the queue
Process it
Repeat until the queue is empty
Exit with code 0

Completions vs Parallelism Matrix

| completions | parallelism | Behavior | |---|---|---| | Not set (1) | Not set (1) | Single Pod | | N | 1 | Sequential: N Pods, one at a time | | N | M | Batch: N total, M concurrent | | Not set | M | Work queue: M concurrent, done when one succeeds |

Adjusting Parallelism at Runtime

You can scale parallelism while a Job is running:

# Increase parallelism to speed up processing
kubectl patch job batch-processor -p '{"spec":{"parallelism":5}}'

# Decrease parallelism to free up cluster resources
kubectl patch job batch-processor -p '{"spec":{"parallelism":1}}'

# Pause the Job by setting parallelism to 0
kubectl patch job batch-processor -p '{"spec":{"parallelism":0}}'

Failure Handling with Parallelism

The backoffLimit applies to the total number of failures across all Pods, not per Pod:

spec:
  completions: 10
  parallelism: 3
  backoffLimit: 5    # Job fails after 5 total Pod failures

If Pod 1, Pod 4, Pod 6, Pod 8, and Pod 9 all fail, the Job is marked Failed even though 5 other Pods succeeded. Be sure to set backoffLimit high enough to account for the expected failure rate across all completions.

Resource Planning

When using parallelism, plan for the total resource usage:

Per Pod: 1 CPU, 512Mi memory
Parallelism: 5
Total peak resources: 5 CPU, 2.5Gi memory

Ensure your cluster has enough capacity to schedule all parallel Pods, or some will remain Pending.

Detailed Answer

Pattern 1: Single Pod (Default)

Pattern 2: Fixed Completions with Parallelism

Pattern 3: Work Queue (No Completions)

Completions vs Parallelism Matrix

Adjusting Parallelism at Runtime

Failure Handling with Parallelism

Resource Planning

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like