What Is Resource Bin Packing in Kubernetes?

Q: What Is Resource Bin Packing in Kubernetes?

Resource bin packing is a scheduling strategy that packs Pods tightly onto fewer nodes to maximize utilization and reduce costs. It is the opposite of the default spreading behavior and is configured through the NodeResourcesFit scoring plugin with a MostAllocated strategy.

Detailed Answer

Resource bin packing is a scheduling strategy that consolidates Pods onto fewer nodes, maximizing resource utilization per node. This is the opposite of the default LeastAllocated strategy, which spreads Pods across nodes for headroom.

Default Behavior: Spreading

By default, the scheduler prefers nodes with the most free resources (LeastAllocated):

Node-1: [70% CPU used] ← Less preferred
Node-2: [30% CPU used] ← More preferred (scheduler picks this)
Node-3: [10% CPU used] ← Most preferred

This provides headroom for burst workloads but wastes resources when nodes sit partially utilized.

Bin Packing: MostAllocated

With bin packing, the scheduler prefers nodes that are already heavily utilized:

Node-1: [70% CPU used] ← Most preferred (scheduler picks this)
Node-2: [30% CPU used] ← Less preferred
Node-3: [10% CPU used] ← Least preferred → Can be removed by autoscaler

Configuring Bin Packing

apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
profiles:
  - schedulerName: default-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

Resource Weighting

You can weight different resources to pack some more aggressively than others:

pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: MostAllocated
        resources:
          - name: cpu
            weight: 1
          - name: memory
            weight: 2         # Pack memory-heavy workloads more aggressively
          - name: nvidia.com/gpu
            weight: 10        # Strongly prefer nodes with GPUs already in use

Higher weight means the scheduler considers that resource more important when scoring nodes.

RequestedToCapacityRatio Strategy

For more fine-grained control, use RequestedToCapacityRatio with a custom scoring function:

pluginConfig:
  - name: NodeResourcesFit
    args:
      scoringStrategy:
        type: RequestedToCapacityRatio
        resources:
          - name: cpu
            weight: 1
          - name: memory
            weight: 1
        shape:
          - utilization: 0
            score: 0
          - utilization: 50
            score: 5
          - utilization: 100
            score: 10

This defines a scoring curve: nodes at 0% utilization score 0, at 50% score 5, and at 100% score 10. The linear increase means higher utilization is always preferred.

Bin Packing + Cluster Autoscaler

The combination is powerful for cost optimization:

1. Bin packing concentrates Pods on fewer nodes
2. Some nodes become empty or nearly empty
3. Cluster autoscaler detects underutilized nodes
4. Autoscaler removes empty nodes
5. Cloud provider stops billing for removed instances

Configure the autoscaler to scale down aggressively:

# Cluster autoscaler configuration
--scale-down-utilization-threshold=0.5
--scale-down-delay-after-add=5m
--scale-down-unneeded-time=5m

Risks and Mitigations

| Risk | Mitigation | |------|------------| | No headroom for burst | Reserve resources with LimitRanges or use burstable QoS | | Node failure impacts many Pods | Use topology spread constraints across nodes | | OOM kills on packed nodes | Set memory limits and monitor actual usage | | Noisy neighbor effects | Use resource limits and Pod priority classes |

Balancing Packing with Availability

Use separate scheduler profiles for different workload types:

profiles:
  # Bin-pack batch/ephemeral workloads
  - schedulerName: batch-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: MostAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

  # Spread critical workloads
  - schedulerName: default-scheduler
    pluginConfig:
      - name: NodeResourcesFit
        args:
          scoringStrategy:
            type: LeastAllocated
            resources:
              - name: cpu
                weight: 1
              - name: memory
                weight: 1

# Batch job uses bin-packing
apiVersion: batch/v1
kind: Job
metadata:
  name: data-processing
spec:
  template:
    spec:
      schedulerName: batch-scheduler
      containers:
        - name: processor
          image: processor:1.0
          resources:
            requests:
              cpu: "2"
              memory: "4Gi"
      restartPolicy: Never

Descheduler Integration

The Descheduler's HighNodeUtilization strategy complements bin packing by evicting Pods from underutilized nodes:

pluginConfig:
  - name: HighNodeUtilization
    args:
      thresholds:
        cpu: 20
        memory: 20
      # Evict Pods from nodes below 20% utilization

Combined flow:

HighNodeUtilization evicts Pods from sparsely used nodes
MostAllocated scheduler reschedules them onto fuller nodes
Cluster Autoscaler removes now-empty nodes

Monitoring Packing Efficiency

# Check node utilization
kubectl top nodes

# View resource requests vs allocatable per node
kubectl describe nodes | grep -A 5 "Allocated resources"

# Calculate packing efficiency
# Efficiency = sum(requests) / sum(allocatable) × 100%
kubectl get nodes -o json | jq '
  [.items[] | {
    name: .metadata.name,
    cpu_allocatable: .status.allocatable.cpu,
    cpu_requests: (.status.capacity.cpu | tonumber - (.status.allocatable.cpu | tonumber))
  }]'

Best Practices

Start with moderate packing — do not jump to 100% utilization immediately
Monitor actual vs. requested — if actual usage is much lower than requests, right-size first
Use PDBs on critical workloads — protect against disruption during rebalancing
Keep topology spread constraints — packing onto nodes does not mean packing into one zone
Test under failure conditions — simulate node failures to verify recovery time with packed clusters