What Are Scheduler Profiles and Plugins in Kubernetes?

advanced|schedulingsreplatform engineerCKA
TL;DR

The Kubernetes scheduling framework provides extension points (Filter, Score, Reserve, Bind) where plugins hook into the scheduling pipeline. Scheduler profiles let you run multiple scheduling configurations in a single scheduler binary, each with different plugin combinations.

Detailed Answer

The Kubernetes scheduling framework is a pluggable architecture that lets you customize how Pods are assigned to nodes. Instead of modifying the scheduler source code, you enable, disable, or configure plugins at well-defined extension points.

Scheduling Cycle Extension Points

Pod arrives → QueueSort → PreFilter → Filter → PostFilter →
              PreScore → Score → NormalizeScore →
              Reserve → Permit → PreBind → Bind → PostBind

| Extension Point | Purpose | Example Plugin | |----------------|---------|----------------| | QueueSort | Order Pods in the scheduling queue | PrioritySort | | PreFilter | Pre-process or check prerequisites | NodeResourcesFit | | Filter | Eliminate unsuitable nodes | NodeAffinity, TaintToleration | | PostFilter | Handle when no node passes Filter | DefaultPreemption | | PreScore | Pre-process for scoring | InterPodAffinity | | Score | Rank remaining nodes | NodeResourcesBalancedAllocation | | Reserve | Reserve resources on the selected node | VolumeBinding | | Permit | Approve, deny, or delay binding | | | PreBind | Pre-binding actions | VolumeBinding | | Bind | Bind Pod to node | DefaultBinder | | PostBind | Post-binding cleanup | |

Default Plugins

The default scheduler includes these plugins:

# These are enabled by default
plugins:
  preFilter:
    - NodeResourcesFit
    - NodePorts
    - PodTopologySpread
    - InterPodAffinity
    - VolumeBinding
  filter:
    - NodeUnschedulable
    - NodeName
    - TaintToleration
    - NodeAffinity
    - NodeResourcesFit
    - VolumeBinding
    - PodTopologySpread
    - InterPodAffinity
  score:
    - NodeResourcesBalancedAllocation
    - ImageLocality
    - InterPodAffinity
    - NodeAffinity
    - PodTopologySpread
    - TaintToleration

Configuring Scheduler Profiles

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  # Default profile for general workloads
  - schedulerName: default-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesBalancedAllocation
            weight: 1
          - name: ImageLocality
            weight: 1
          - name: InterPodAffinity
            weight: 1

  # Profile optimized for batch workloads (pack tightly)
  - schedulerName: batch-scheduler
    plugins:
      score:
        enabled:
          - name: NodeResourcesFit
            weight: 1
        disabled:
          - name: NodeResourcesBalancedAllocation
          - name: InterPodAffinity
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated  # Pack Pods tightly
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

  # Profile for GPU workloads
  - schedulerName: gpu-scheduler
    plugins:
      filter:
        enabled:
          - name: NodeResourcesFit
      score:
        enabled:
          - name: NodeResourcesFit
            weight: 2
          - name: ImageLocality
            weight: 3  # Prefer nodes with GPU images cached

Using Profiles

Direct Pods to specific profiles:

apiVersion: v1
kind: Pod
metadata:
  name: batch-job
spec:
  schedulerName: batch-scheduler   # Uses the batch profile
  containers:
    - name: worker
      image: batch-processor:1.0
      resources:
        requests:
          cpu: "2"
          memory: "4Gi"

Plugin Configuration

Many plugins accept configuration parameters:

pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: LeastAllocated     # Spread workloads (default)
        # type: MostAllocated    # Pack workloads (bin-packing)
        # type: RequestedToCapacityRatio  # Custom ratio
        resources:
          - name: cpu
            weight: 1
          - name: memory
            weight: 1
          - name: nvidia.com/gpu
            weight: 5            # Heavily weight GPU availability

  - name: PodTopologySpread
    args:
      defaultConstraints:
        - maxSkew: 1
          topologyKey: topology.kubernetes.io/zone
          whenUnsatisfiable: ScheduleAnyway
      defaultingType: List

  - name: InterPodAffinity
    args:
      hardPodAffinityWeight: 1

Bin-Packing vs. Spreading

The scoring strategy dramatically changes Pod placement:

| Strategy | Behavior | Use Case | |----------|----------|----------| | LeastAllocated | Prefer emptier nodes | General workloads, resource headroom | | MostAllocated | Prefer fuller nodes | Cost optimization, batch jobs | | RequestedToCapacityRatio | Custom utilization targets | Fine-tuned balance |

Writing Custom Plugins

Custom plugins implement Go interfaces:

type FilterPlugin interface {
    Filter(ctx context.Context, state *CycleState,
           pod *v1.Pod, nodeInfo *NodeInfo) *Status
}

type ScorePlugin interface {
    Score(ctx context.Context, state *CycleState,
          pod *v1.Pod, nodeName string) (int64, *Status)
}

Register and compile the plugin with the scheduler binary, then reference it in the profile configuration.

Monitoring Scheduler Performance

# Scheduler metrics
# scheduler_scheduling_attempt_duration_seconds
# scheduler_pending_pods
# scheduler_schedule_attempts_total{result="scheduled|unschedulable|error"}

# Check which profile scheduled a Pod
kubectl get pod batch-job -o jsonpath='{.spec.schedulerName}'

# View scheduler logs
kubectl logs -n kube-system kube-scheduler-<node>

When to Customize Profiles

| Scenario | Customization | |----------|--------------| | Cost optimization | MostAllocated scoring (bin-packing) | | GPU workloads | Custom plugin or high ImageLocality weight | | Batch processing | Disable affinity scoring, use MostAllocated | | Multi-tenant fairness | Custom QueueSort plugin | | Latency-sensitive | Prefer nodes with warm caches (ImageLocality) |

Why Interviewers Ask This

Understanding the scheduling framework shows deep knowledge of how Kubernetes places Pods and how to customize it for specialized workloads without writing a scheduler from scratch.

Common Follow-Up Questions

What are the main extension points in the scheduling framework?
PreFilter, Filter, PostFilter, PreScore, Score, Reserve, Permit, PreBind, Bind, and PostBind. Each serves a specific phase of the scheduling cycle.
How do you disable a default plugin?
In the scheduler configuration, add the plugin to the disabled list under the relevant extension point.
Can different Pods use different scheduler profiles?
Yes — set spec.schedulerName on the Pod to match the profile name. The scheduler routes the Pod to the matching profile.

Key Takeaways

  • The scheduling framework replaces the older predicate/priority model with a plugin architecture.
  • Profiles allow multiple scheduling behaviors in a single scheduler binary.
  • You can enable, disable, or configure plugins per profile for workload-specific scheduling.

Related Questions

You Might Also Like