What Is Resource Bin Packing in Kubernetes?

advanced|schedulingsreplatform engineerCKA
TL;DR

Resource bin packing is a scheduling strategy that packs Pods tightly onto fewer nodes to maximize utilization and reduce costs. It is the opposite of the default spreading behavior and is configured through the NodeResourcesFit scoring plugin with a MostAllocated strategy.

Detailed Answer

Resource bin packing is a scheduling strategy that consolidates Pods onto fewer nodes, maximizing resource utilization per node. This is the opposite of the default LeastAllocated strategy, which spreads Pods across nodes for headroom.

Default Behavior: Spreading

By default, the scheduler prefers nodes with the most free resources (LeastAllocated):

Node-1: [70% CPU used] ← Less preferred
Node-2: [30% CPU used] ← More preferred (scheduler picks this)
Node-3: [10% CPU used] ← Most preferred

This provides headroom for burst workloads but wastes resources when nodes sit partially utilized.

Bin Packing: MostAllocated

With bin packing, the scheduler prefers nodes that are already heavily utilized:

Node-1: [70% CPU used] ← Most preferred (scheduler picks this)
Node-2: [30% CPU used] ← Less preferred
Node-3: [10% CPU used] ← Least preferred → Can be removed by autoscaler

Configuring Bin Packing

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

Resource Weighting

You can weight different resources to pack some more aggressively than others:

pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: MostAllocated
        resources:
          - name: cpu
            weight: 1
          - name: memory
            weight: 2         # Pack memory-heavy workloads more aggressively
          - name: nvidia.com/gpu
            weight: 10        # Strongly prefer nodes with GPUs already in use

Higher weight means the scheduler considers that resource more important when scoring nodes.

RequestedToCapacityRatio Strategy

For more fine-grained control, use RequestedToCapacityRatio with a custom scoring function:

pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: RequestedToCapacityRatio
        resources:
          - name: cpu
            weight: 1
          - name: memory
            weight: 1
        shape:
          - utilization: 0
            score: 0
          - utilization: 50
            score: 5
          - utilization: 100
            score: 10

This defines a scoring curve: nodes at 0% utilization score 0, at 50% score 5, and at 100% score 10. The linear increase means higher utilization is always preferred.

Bin Packing + Cluster Autoscaler

The combination is powerful for cost optimization:

1. Bin packing concentrates Pods on fewer nodes
2. Some nodes become empty or nearly empty
3. Cluster autoscaler detects underutilized nodes
4. Autoscaler removes empty nodes
5. Cloud provider stops billing for removed instances

Configure the autoscaler to scale down aggressively:

# Cluster autoscaler configuration
--scale-down-utilization-threshold=0.5
--scale-down-delay-after-add=5m
--scale-down-unneeded-time=5m

Risks and Mitigations

| Risk | Mitigation | |------|------------| | No headroom for burst | Reserve resources with LimitRanges or use burstable QoS | | Node failure impacts many Pods | Use topology spread constraints across nodes | | OOM kills on packed nodes | Set memory limits and monitor actual usage | | Noisy neighbor effects | Use resource limits and Pod priority classes |

Balancing Packing with Availability

Use separate scheduler profiles for different workload types:

profiles:
  # Bin-pack batch/ephemeral workloads
  - schedulerName: batch-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

  # Spread critical workloads
  - schedulerName: default-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: LeastAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1
# Batch job uses bin-packing
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      schedulerName: batch-scheduler
      containers:
        - name: processor
          image: processor:1.0
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
      restartPolicy: Never

Descheduler Integration

The Descheduler's HighNodeUtilization strategy complements bin packing by evicting Pods from underutilized nodes:

pluginConfig:
  - name: HighNodeUtilization
    args:
      thresholds:
        cpu: 20
        memory: 20
      # Evict Pods from nodes below 20% utilization

Combined flow:

  1. HighNodeUtilization evicts Pods from sparsely used nodes
  2. MostAllocated scheduler reschedules them onto fuller nodes
  3. Cluster Autoscaler removes now-empty nodes

Monitoring Packing Efficiency

# Check node utilization
kubectl top nodes

# View resource requests vs allocatable per node
kubectl describe nodes | grep -A 5 "Allocated resources"

# Calculate packing efficiency
# Efficiency = sum(requests) / sum(allocatable) × 100%
kubectl get nodes -o json | jq '
  [.items[] | {
    name: .metadata.name,
    cpu_allocatable: .status.allocatable.cpu,
    cpu_requests: (.status.capacity.cpu | tonumber - (.status.allocatable.cpu | tonumber))
  }]'

Best Practices

  1. Start with moderate packing — do not jump to 100% utilization immediately
  2. Monitor actual vs. requested — if actual usage is much lower than requests, right-size first
  3. Use PDBs on critical workloads — protect against disruption during rebalancing
  4. Keep topology spread constraints — packing onto nodes does not mean packing into one zone
  5. Test under failure conditions — simulate node failures to verify recovery time with packed clusters

Why Interviewers Ask This

In cloud environments, node costs are proportional to the number of running instances. Bin packing reduces waste by consolidating workloads, but it must be balanced against availability and performance.

Common Follow-Up Questions

How does bin packing interact with the cluster autoscaler?
Bin packing concentrates Pods on fewer nodes, making underutilized nodes empty. The cluster autoscaler then removes empty nodes, directly reducing infrastructure costs.
What are the risks of aggressive bin packing?
Tightly packed nodes leave no resource headroom for burst workloads. If a packed node fails, many Pods need to be rescheduled simultaneously, potentially causing cascading failures.
Can you bin-pack specific resource types differently?
Yes — you can assign different weights to CPU, memory, and extended resources. For example, weight GPU heavily to pack GPU workloads while spreading CPU-bound workloads.

Key Takeaways

  • Bin packing uses MostAllocated scoring to prefer nodes that are already heavily utilized.
  • It pairs well with the cluster autoscaler to reduce the number of running nodes.
  • Balance packing density with PDBs and topology spread to maintain availability.

Related Questions

You Might Also Like