What Are Indexed Jobs in Kubernetes?

advanced|jobsdevopssrebackend developerplatform engineerCKACKAD
TL;DR

Indexed Jobs assign each Pod a unique completion index (0, 1, 2, ...) via the JOB_COMPLETION_INDEX environment variable. This allows each Pod to process a specific partition of work — like processing a specific shard of data or a specific chunk of a file — without needing an external work queue.

Detailed Answer

Indexed Jobs (introduced in Kubernetes 1.24 as stable) provide a way to run parallel batch work where each Pod processes a known, fixed partition. Instead of all Pods pulling from a shared queue, each Pod is assigned a unique index and can determine its own work based on that index.

How It Works

Set completionMode: Indexed on the Job. Each Pod receives a zero-based index through the JOB_COMPLETION_INDEX environment variable:

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
spec:
  completionMode: Indexed
  completions: 10
  parallelism: 5
  template:
    spec:
      containers:
        - name: processor
          image: myapp/data-processor:v2
          command:
            - "python"
            - "process_shard.py"
            - "--shard-index=$(JOB_COMPLETION_INDEX)"
            - "--total-shards=10"
          env:
            - name: JOB_COMPLETION_INDEX
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
          resources:
            requests:
              cpu: "1"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "2Gi"
      restartPolicy: Never
  backoffLimit: 3

This creates 10 Pods (5 at a time), each assigned an index from 0 to 9. Pod with index 3 processes shard 3 of the data.

Practical Example: Processing Data Shards

Suppose you have a database with 10 million records and want to process them in parallel across 10 Pods:

# process_shard.py
import os
import sys

shard_index = int(os.environ["JOB_COMPLETION_INDEX"])
total_shards = 10
total_records = 10_000_000

# Calculate this shard's range
records_per_shard = total_records // total_shards
start = shard_index * records_per_shard
end = start + records_per_shard

print(f"Processing records {start} to {end}")
# Process records in range [start, end)

Each Pod handles exactly 1 million records, with no overlap and no coordination needed.

Indexed Jobs vs Other Patterns

| Pattern | Coordination | Work Assignment | Use Case | |---|---|---|---| | Indexed Job | None needed | Static (by index) | Sharded data, file chunks | | Work Queue Job | External queue | Dynamic (pull model) | Variable-size tasks | | Fixed Completions | None | Independent (no assignment) | N identical tasks |

Pod Naming Convention

Indexed Job Pods include the index in their name:

kubectl get pods -l job-name=data-processor
# NAME                     READY   STATUS      RESTARTS   AGE
# data-processor-0-abc12   0/1     Completed   0          5m
# data-processor-1-def34   0/1     Completed   0          5m
# data-processor-2-ghi56   1/1     Running     0          3m
# data-processor-3-jkl78   1/1     Running     0          3m
# data-processor-4-mno90   1/1     Running     0          3m

Failure Handling for Indexed Jobs

When a Pod with a specific index fails, the Job controller creates a new Pod for the same index. The replacement Pod gets the same JOB_COMPLETION_INDEX value:

# Pod for index 2 fails
# data-processor-2-ghi56   0/1   Error   0   5m

# Controller creates a new Pod for index 2
# data-processor-2-pqr12   1/1   Running   0   10s

This ensures every index is completed, even if individual Pods fail.

Using Index for Configuration

Beyond data sharding, the index can drive other per-Pod configuration:

# Each Pod processes a different file
command:
  - "process"
  - "/data/part-$(JOB_COMPLETION_INDEX).csv"

# Each Pod handles a different region
env:
  - name: REGION
    value: "region-$(JOB_COMPLETION_INDEX)"

# Each Pod uses a different port
ports:
  - containerPort: $((8080 + JOB_COMPLETION_INDEX))

Combining with Volume Mounts

For large-scale data processing, each indexed Pod can mount a shared volume and process its assigned portion:

apiVersion: batch/v1
kind: Job
metadata:
  name: video-transcoder
spec:
  completionMode: Indexed
  completions: 50
  parallelism: 10
  template:
    spec:
      containers:
        - name: transcoder
          image: myapp/transcoder:v1
          command: ["./transcode.sh", "--chunk=$(JOB_COMPLETION_INDEX)"]
          volumeMounts:
            - name: video-data
              mountPath: /data
              readOnly: true
            - name: output
              mountPath: /output
          resources:
            requests:
              cpu: "4"
              memory: "4Gi"
            limits:
              cpu: "8"
              memory: "8Gi"
      volumes:
        - name: video-data
          persistentVolumeClaim:
            claimName: raw-videos
        - name: output
          persistentVolumeClaim:
            claimName: transcoded-output
      restartPolicy: Never
  backoffLimit: 5
  ttlSecondsAfterFinished: 300

This processes 50 video chunks, 10 at a time, with each Pod handling a specific chunk based on its index.

Why Interviewers Ask This

Interviewers ask this to assess your knowledge of advanced batch processing patterns in Kubernetes and whether you can design parallel workloads without external coordination systems.

Common Follow-Up Questions

How does a Pod know its index?
The index is available via the JOB_COMPLETION_INDEX environment variable and also set in the Pod annotation batch.kubernetes.io/job-completion-index.
How do Indexed Jobs differ from work-queue Jobs?
Indexed Jobs assign fixed work partitions (each index processes a known subset). Work-queue Jobs have all Pods pulling from a shared queue dynamically.
Can you restart a specific index in an Indexed Job?
If a Pod for a specific index fails, the Job controller automatically creates a new Pod for that index. You cannot manually trigger a restart for one index.

Key Takeaways

  • Indexed Jobs use completionMode: Indexed to assign each Pod a unique zero-based index.
  • The index is available as the JOB_COMPLETION_INDEX environment variable.
  • This pattern eliminates the need for external work queues when work can be statically partitioned.

Related Questions

You Might Also Like