What Are Indexed Jobs in Kubernetes?

Q: What Are Indexed Jobs in Kubernetes?

Indexed Jobs assign each Pod a unique completion index (0, 1, 2, ...) via the JOB_COMPLETION_INDEX environment variable. This allows each Pod to process a specific partition of work — like processing a specific shard of data or a specific chunk of a file — without needing an external work queue.

Detailed Answer

Indexed Jobs (introduced in Kubernetes 1.24 as stable) provide a way to run parallel batch work where each Pod processes a known, fixed partition. Instead of all Pods pulling from a shared queue, each Pod is assigned a unique index and can determine its own work based on that index.

How It Works

Set completionMode: Indexed on the Job. Each Pod receives a zero-based index through the JOB_COMPLETION_INDEX environment variable:

apiVersion: batch/v1
kind: Job
metadata:
  name: data-processor
spec:
  completionMode: Indexed
  completions: 10
  parallelism: 5
  template:
    spec:
      containers:
        - name: processor
          image: myapp/data-processor:v2
          command:
            - "python"
            - "process_shard.py"
            - "--shard-index=$(JOB_COMPLETION_INDEX)"
            - "--total-shards=10"
          env:
            - name: JOB_COMPLETION_INDEX
              valueFrom:
                fieldRef:
                  fieldPath: metadata.annotations['batch.kubernetes.io/job-completion-index']
          resources:
            requests:
              cpu: "1"
              memory: "1Gi"
            limits:
              cpu: "2"
              memory: "2Gi"
      restartPolicy: Never
  backoffLimit: 3

This creates 10 Pods (5 at a time), each assigned an index from 0 to 9. Pod with index 3 processes shard 3 of the data.

Practical Example: Processing Data Shards

Suppose you have a database with 10 million records and want to process them in parallel across 10 Pods:

# process_shard.py
import os
import sys

shard_index = int(os.environ["JOB_COMPLETION_INDEX"])
total_shards = 10
total_records = 10_000_000

# Calculate this shard's range
records_per_shard = total_records // total_shards
start = shard_index * records_per_shard
end = start + records_per_shard

print(f"Processing records {start} to {end}")
# Process records in range [start, end)

Each Pod handles exactly 1 million records, with no overlap and no coordination needed.

Indexed Jobs vs Other Patterns

| Pattern | Coordination | Work Assignment | Use Case | |---|---|---|---| | Indexed Job | None needed | Static (by index) | Sharded data, file chunks | | Work Queue Job | External queue | Dynamic (pull model) | Variable-size tasks | | Fixed Completions | None | Independent (no assignment) | N identical tasks |

Pod Naming Convention

Indexed Job Pods include the index in their name:

kubectl get pods -l job-name=data-processor
# NAME                     READY   STATUS      RESTARTS   AGE
# data-processor-0-abc12   0/1     Completed   0          5m
# data-processor-1-def34   0/1     Completed   0          5m
# data-processor-2-ghi56   1/1     Running     0          3m
# data-processor-3-jkl78   1/1     Running     0          3m
# data-processor-4-mno90   1/1     Running     0          3m

Failure Handling for Indexed Jobs

When a Pod with a specific index fails, the Job controller creates a new Pod for the same index. The replacement Pod gets the same JOB_COMPLETION_INDEX value:

# Pod for index 2 fails
# data-processor-2-ghi56   0/1   Error   0   5m

# Controller creates a new Pod for index 2
# data-processor-2-pqr12   1/1   Running   0   10s

This ensures every index is completed, even if individual Pods fail.

Using Index for Configuration

Beyond data sharding, the index can drive other per-Pod configuration:

# Each Pod processes a different file
command:
  - "process"
  - "/data/part-$(JOB_COMPLETION_INDEX).csv"

# Each Pod handles a different region
env:
  - name: REGION
    value: "region-$(JOB_COMPLETION_INDEX)"

# Each Pod uses a different port
ports:
  - containerPort: $((8080 + JOB_COMPLETION_INDEX))

Combining with Volume Mounts

For large-scale data processing, each indexed Pod can mount a shared volume and process its assigned portion:

apiVersion: batch/v1
kind: Job
metadata:
  name: video-transcoder
spec:
  completionMode: Indexed
  completions: 50
  parallelism: 10
  template:
    spec:
      containers:
        - name: transcoder
          image: myapp/transcoder:v1
          command: ["./transcode.sh", "--chunk=$(JOB_COMPLETION_INDEX)"]
          volumeMounts:
            - name: video-data
              mountPath: /data
              readOnly: true
            - name: output
              mountPath: /output
          resources:
            requests:
              cpu: "4"
              memory: "4Gi"
            limits:
              cpu: "8"
              memory: "8Gi"
      volumes:
        - name: video-data
          persistentVolumeClaim:
            claimName: raw-videos
        - name: output
          persistentVolumeClaim:
            claimName: transcoded-output
      restartPolicy: Never
  backoffLimit: 5
  ttlSecondsAfterFinished: 300

This processes 50 video chunks, 10 at a time, with each Pod handling a specific chunk based on its index.

Detailed Answer

How It Works

Practical Example: Processing Data Shards

Indexed Jobs vs Other Patterns

Pod Naming Convention

Failure Handling for Indexed Jobs

Using Index for Configuration

Combining with Volume Mounts

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like