How Do Job Completions and Parallelism Work?

intermediate|jobsdevopssrebackend developerCKACKAD
TL;DR

The completions field specifies how many Pods must successfully complete for the Job to be considered done. The parallelism field controls how many Pods run concurrently. Together, they enable batch processing patterns — for example, completions: 10 with parallelism: 3 runs 10 tasks, 3 at a time.

Detailed Answer

Kubernetes Jobs support three patterns based on how you set completions and parallelism. Understanding these patterns is essential for designing efficient batch workloads.

Pattern 1: Single Pod (Default)

With no completions or parallelism specified, the Job runs a single Pod:

apiVersion: batch/v1
kind: Job
metadata:
  name: single-task
spec:
  template:
    spec:
      containers:
        - name: task
          image: myapp/task:v1
          resources:
            requests:
              cpu: "500m"
              memory: "256Mi"
      restartPolicy: Never

This is equivalent to completions: 1, parallelism: 1.

Pattern 2: Fixed Completions with Parallelism

Process N items with up to M concurrent Pods:

apiVersion: batch/v1
kind: Job
metadata:
  name: batch-processor
spec:
  completions: 10    # 10 Pods must succeed
  parallelism: 3     # Run up to 3 Pods at a time
  template:
    spec:
      containers:
        - name: processor
          image: myapp/batch-processor:v2
          command: ["python", "process.py"]
          resources:
            requests:
              cpu: "1"
              memory: "512Mi"
            limits:
              cpu: "2"
              memory: "1Gi"
      restartPolicy: Never
  backoffLimit: 5

Execution flow:

  1. 3 Pods are created simultaneously (Pods 1-3)
  2. When Pod 1 completes, Pod 4 starts (maintaining 3 concurrent)
  3. This continues until 10 Pods have successfully completed
  4. Job status shows COMPLETIONS: 10/10
kubectl get job batch-processor -w
# NAME              COMPLETIONS   DURATION   AGE
# batch-processor   0/10          5s         5s
# batch-processor   1/10          12s        12s
# batch-processor   2/10          15s        15s
# ...
# batch-processor   10/10         55s        55s

Pattern 3: Work Queue (No Completions)

For work queue patterns where Pods pull from a shared queue until it is empty:

apiVersion: batch/v1
kind: Job
metadata:
  name: queue-worker
spec:
  parallelism: 5     # 5 concurrent workers
  # completions is not set
  template:
    spec:
      containers:
        - name: worker
          image: myapp/queue-worker:v1
          env:
            - name: QUEUE_URL
              value: "redis://redis:6379/0"
          resources:
            requests:
              cpu: "250m"
              memory: "256Mi"
      restartPolicy: Never

Without completions, the Job creates parallelism Pods and considers itself complete when at least one Pod exits successfully. Each worker should:

  1. Pull a task from the queue
  2. Process it
  3. Repeat until the queue is empty
  4. Exit with code 0

Completions vs Parallelism Matrix

| completions | parallelism | Behavior | |---|---|---| | Not set (1) | Not set (1) | Single Pod | | N | 1 | Sequential: N Pods, one at a time | | N | M | Batch: N total, M concurrent | | Not set | M | Work queue: M concurrent, done when one succeeds |

Adjusting Parallelism at Runtime

You can scale parallelism while a Job is running:

# Increase parallelism to speed up processing
kubectl patch job batch-processor -p '{"spec":{"parallelism":5}}'

# Decrease parallelism to free up cluster resources
kubectl patch job batch-processor -p '{"spec":{"parallelism":1}}'

# Pause the Job by setting parallelism to 0
kubectl patch job batch-processor -p '{"spec":{"parallelism":0}}'

Failure Handling with Parallelism

The backoffLimit applies to the total number of failures across all Pods, not per Pod:

spec:
  completions: 10
  parallelism: 3
  backoffLimit: 5    # Job fails after 5 total Pod failures

If Pod 1, Pod 4, Pod 6, Pod 8, and Pod 9 all fail, the Job is marked Failed even though 5 other Pods succeeded. Be sure to set backoffLimit high enough to account for the expected failure rate across all completions.

Resource Planning

When using parallelism, plan for the total resource usage:

Per Pod: 1 CPU, 512Mi memory
Parallelism: 5
Total peak resources: 5 CPU, 2.5Gi memory

Ensure your cluster has enough capacity to schedule all parallel Pods, or some will remain Pending.

Why Interviewers Ask This

Interviewers ask this to evaluate your understanding of batch processing patterns in Kubernetes and how to configure Jobs for throughput and resource efficiency.

Common Follow-Up Questions

What happens if you set parallelism but not completions?
Without completions, the Job runs parallelism number of Pods and considers itself complete when any Pod succeeds. This is a work-queue style Job.
Can you change parallelism while a Job is running?
Yes, you can patch the parallelism field to increase or decrease the number of concurrent Pods while the Job is in progress.
What if one parallel Pod fails?
The Job controller creates a replacement Pod, still respecting the parallelism limit. Failures count toward the backoffLimit across all Pods.

Key Takeaways

  • completions defines the total number of successful Pod runs needed; parallelism defines the concurrency level.
  • Default values are completions: 1 and parallelism: 1, which runs a single Pod.
  • Work queue Jobs use parallelism without completions — Pods exit when the queue is empty.

Related Questions

You Might Also Like