What Are Scheduler Profiles and Plugins in Kubernetes?
The Kubernetes scheduling framework provides extension points (Filter, Score, Reserve, Bind) where plugins hook into the scheduling pipeline. Scheduler profiles let you run multiple scheduling configurations in a single scheduler binary, each with different plugin combinations.
Detailed Answer
The Kubernetes scheduling framework is a pluggable architecture that lets you customize how Pods are assigned to nodes. Instead of modifying the scheduler source code, you enable, disable, or configure plugins at well-defined extension points.
Scheduling Cycle Extension Points
Pod arrives → QueueSort → PreFilter → Filter → PostFilter →
PreScore → Score → NormalizeScore →
Reserve → Permit → PreBind → Bind → PostBind
| Extension Point | Purpose | Example Plugin | |----------------|---------|----------------| | QueueSort | Order Pods in the scheduling queue | PrioritySort | | PreFilter | Pre-process or check prerequisites | NodeResourcesFit | | Filter | Eliminate unsuitable nodes | NodeAffinity, TaintToleration | | PostFilter | Handle when no node passes Filter | DefaultPreemption | | PreScore | Pre-process for scoring | InterPodAffinity | | Score | Rank remaining nodes | NodeResourcesBalancedAllocation | | Reserve | Reserve resources on the selected node | VolumeBinding | | Permit | Approve, deny, or delay binding | | | PreBind | Pre-binding actions | VolumeBinding | | Bind | Bind Pod to node | DefaultBinder | | PostBind | Post-binding cleanup | |
Default Plugins
The default scheduler includes these plugins:
# These are enabled by default
plugins:
preFilter:
- NodeResourcesFit
- NodePorts
- PodTopologySpread
- InterPodAffinity
- VolumeBinding
filter:
- NodeUnschedulable
- NodeName
- TaintToleration
- NodeAffinity
- NodeResourcesFit
- VolumeBinding
- PodTopologySpread
- InterPodAffinity
score:
- NodeResourcesBalancedAllocation
- ImageLocality
- InterPodAffinity
- NodeAffinity
- PodTopologySpread
- TaintToleration
Configuring Scheduler Profiles
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
# Default profile for general workloads
- schedulerName: default-scheduler
plugins:
score:
enabled:
- name: NodeResourcesBalancedAllocation
weight: 1
- name: ImageLocality
weight: 1
- name: InterPodAffinity
weight: 1
# Profile optimized for batch workloads (pack tightly)
- schedulerName: batch-scheduler
plugins:
score:
enabled:
- name: NodeResourcesFit
weight: 1
disabled:
- name: NodeResourcesBalancedAllocation
- name: InterPodAffinity
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: MostAllocated # Pack Pods tightly
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
# Profile for GPU workloads
- schedulerName: gpu-scheduler
plugins:
filter:
enabled:
- name: NodeResourcesFit
score:
enabled:
- name: NodeResourcesFit
weight: 2
- name: ImageLocality
weight: 3 # Prefer nodes with GPU images cached
Using Profiles
Direct Pods to specific profiles:
apiVersion: v1
kind: Pod
metadata:
name: batch-job
spec:
schedulerName: batch-scheduler # Uses the batch profile
containers:
- name: worker
image: batch-processor:1.0
resources:
requests:
cpu: "2"
memory: "4Gi"
Plugin Configuration
Many plugins accept configuration parameters:
pluginConfig:
- name: NodeResourcesFit
args:
scoringStrategy:
type: LeastAllocated # Spread workloads (default)
# type: MostAllocated # Pack workloads (bin-packing)
# type: RequestedToCapacityRatio # Custom ratio
resources:
- name: cpu
weight: 1
- name: memory
weight: 1
- name: nvidia.com/gpu
weight: 5 # Heavily weight GPU availability
- name: PodTopologySpread
args:
defaultConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
defaultingType: List
- name: InterPodAffinity
args:
hardPodAffinityWeight: 1
Bin-Packing vs. Spreading
The scoring strategy dramatically changes Pod placement:
| Strategy | Behavior | Use Case | |----------|----------|----------| | LeastAllocated | Prefer emptier nodes | General workloads, resource headroom | | MostAllocated | Prefer fuller nodes | Cost optimization, batch jobs | | RequestedToCapacityRatio | Custom utilization targets | Fine-tuned balance |
Writing Custom Plugins
Custom plugins implement Go interfaces:
type FilterPlugin interface {
Filter(ctx context.Context, state *CycleState,
pod *v1.Pod, nodeInfo *NodeInfo) *Status
}
type ScorePlugin interface {
Score(ctx context.Context, state *CycleState,
pod *v1.Pod, nodeName string) (int64, *Status)
}
Register and compile the plugin with the scheduler binary, then reference it in the profile configuration.
Monitoring Scheduler Performance
# Scheduler metrics
# scheduler_scheduling_attempt_duration_seconds
# scheduler_pending_pods
# scheduler_schedule_attempts_total{result="scheduled|unschedulable|error"}
# Check which profile scheduled a Pod
kubectl get pod batch-job -o jsonpath='{.spec.schedulerName}'
# View scheduler logs
kubectl logs -n kube-system kube-scheduler-<node>
When to Customize Profiles
| Scenario | Customization | |----------|--------------| | Cost optimization | MostAllocated scoring (bin-packing) | | GPU workloads | Custom plugin or high ImageLocality weight | | Batch processing | Disable affinity scoring, use MostAllocated | | Multi-tenant fairness | Custom QueueSort plugin | | Latency-sensitive | Prefer nodes with warm caches (ImageLocality) |
Why Interviewers Ask This
Understanding the scheduling framework shows deep knowledge of how Kubernetes places Pods and how to customize it for specialized workloads without writing a scheduler from scratch.
Common Follow-Up Questions
Key Takeaways
- The scheduling framework replaces the older predicate/priority model with a plugin architecture.
- Profiles allow multiple scheduling behaviors in a single scheduler binary.
- You can enable, disable, or configure plugins per profile for workload-specific scheduling.