How Do Custom Schedulers Work in Kubernetes?

Q: How Do Custom Schedulers Work in Kubernetes?

Kubernetes supports running multiple schedulers simultaneously. You can deploy a custom scheduler that implements specialized placement logic and direct specific Pods to use it via the schedulerName field.

Detailed Answer

The default Kubernetes scheduler handles most workloads well, but some scenarios require custom scheduling logic — GPU-aware placement, gang scheduling for batch jobs, or cost-optimized node selection. Kubernetes supports multiple schedulers running simultaneously.

Directing Pods to a Custom Scheduler

apiVersion: v1
kind: Pod
metadata:
  name: gpu-training
spec:
  schedulerName: gpu-scheduler
  containers:
    - name: trainer
      image: training-job:1.0
      resources:
        requests:
          cpu: "4"
          memory: "16Gi"
          nvidia.com/gpu: "2"

If schedulerName is not set, the Pod uses default-scheduler. If the named scheduler does not exist, the Pod remains Pending indefinitely.

Approaches to Custom Scheduling

There are three levels of customization, from simplest to most complex:

1. Scheduler Extenders (Deprecated Path)

Extenders are HTTP webhooks that the default scheduler calls at filter and score phases:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
extenders:
  - urlPrefix: "http://gpu-extender.kube-system:8080"
    filterVerb: "filter"
    prioritizeVerb: "prioritize"
    weight: 5
    enableHTTPS: false
    managedResources:
      - name: "nvidia.com/gpu"
        ignoredByScheduler: true

Extenders are simple but add HTTP round-trip latency to every scheduling decision.

2. Scheduling Framework Plugins (Recommended)

The scheduling framework (introduced in 1.19) defines extension points throughout the scheduling cycle:

Scheduling Cycle:
  PreFilter → Filter → PostFilter → PreScore → Score → Reserve → Permit

Binding Cycle:
  PreBind → Bind → PostBind

You implement a Go interface for the desired extension point:

package main

import (
    "context"
    v1 "k8s.io/api/core/v1"
    "k8s.io/kubernetes/pkg/scheduler/framework"
)

type GPUAwarePlugin struct{}

func (p *GPUAwarePlugin) Name() string {
    return "GPUAware"
}

func (p *GPUAwarePlugin) Score(
    ctx context.Context,
    state *framework.CycleState,
    pod *v1.Pod,
    nodeName string,
) (int64, *framework.Status) {
    // Custom scoring logic:
    // Prefer nodes with matching GPU type
    nodeInfo, _ := state.Read("nodeInfo")
    gpuType := getGPUType(nodeInfo)

    requestedGPU := pod.Annotations["preferred-gpu-type"]
    if gpuType == requestedGPU {
        return 100, nil
    }
    return 50, nil
}

func (p *GPUAwarePlugin) ScoreExtensions() framework.ScoreExtensions {
    return nil
}

3. Scheduler Profiles (Multiple Schedulers in One Binary)

Instead of deploying multiple scheduler binaries, you can configure multiple profiles in a single scheduler:

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesFit
          - name: InterPodAffinity
  - schedulerName: gpu-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesFit
          - name: GPUAware
        disabled:
          - name: InterPodAffinity
  - schedulerName: batch-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesFit
      preFilter:
        enabled:
          - name: GangScheduling

Pods set schedulerName to default-scheduler, gpu-scheduler, or batch-scheduler, and the single scheduler binary routes them to the appropriate profile.

Deploying a Custom Scheduler

When running a fully custom scheduler as a separate Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: custom-scheduler
  namespace: kube-system
spec:
  replicas: 2
  selector:
    matchLabels:
      component: custom-scheduler
  template:
    metadata:
      labels:
        component: custom-scheduler
    spec:
      serviceAccountName: custom-scheduler
      containers:
        - name: scheduler
          image: my-custom-scheduler:1.0
          command:
            - /usr/local/bin/kube-scheduler
            - --config=/etc/scheduler/config.yaml
            - --leader-elect=true
            - --leader-elect-resource-name=custom-scheduler
          volumeMounts:
            - name: config
              mountPath: /etc/scheduler
          resources:
            requests:
              cpu: "100m"
              memory: "256Mi"
      volumes:
        - name: config
          configMap:
            name: custom-scheduler-config

RBAC for Custom Schedulers

The scheduler needs permissions to read Pods, Nodes, and PersistentVolumes, and to create Bindings and Events:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: custom-scheduler
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list", "watch", "update", "patch"]
  - apiGroups: [""]
    resources: ["pods/binding"]
    verbs: ["create"]
  - apiGroups: [""]
    resources: ["nodes"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "patch"]
  - apiGroups: ["coordination.k8s.io"]
    resources: ["leases"]
    verbs: ["get", "create", "update"]

When to Use Custom Scheduling

| Use Case | Approach | |----------|----------| | Simple priority adjustments | Pod priority classes (no custom scheduler needed) | | GPU or hardware-aware placement | Scheduler plugin or extender | | Gang scheduling (all-or-nothing) | Custom scheduler with coscheduling plugin | | Cost-optimized spot instance placement | Score plugin preferring cheaper nodes | | Multi-tenant fairness | Custom queue-based scheduler |

Debugging Custom Schedulers

# Check which scheduler is assigned to a Pod
kubectl get pod gpu-training -o jsonpath='{.spec.schedulerName}'

# Check scheduler logs
kubectl logs -n kube-system -l component=custom-scheduler

# Verify the scheduler is running
kubectl get pods -n kube-system -l component=custom-scheduler

# Check for scheduling events
kubectl describe pod gpu-training | grep -A 5 Events

Detailed Answer

Directing Pods to a Custom Scheduler

Approaches to Custom Scheduling

1. Scheduler Extenders (Deprecated Path)

2. Scheduling Framework Plugins (Recommended)

3. Scheduler Profiles (Multiple Schedulers in One Binary)

Deploying a Custom Scheduler

RBAC for Custom Schedulers

When to Use Custom Scheduling

Debugging Custom Schedulers

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like