What Is Storage Capacity Tracking in Kubernetes?

advanced|storagesreplatform engineerCKA
TL;DR

Storage capacity tracking lets the Kubernetes scheduler consider available storage capacity on nodes when placing Pods. It uses CSIStorageCapacity objects to inform scheduling decisions, preventing Pods from being scheduled on nodes where storage provisioning would fail.

Detailed Answer

Storage capacity tracking (GA since Kubernetes 1.24) informs the scheduler about available storage capacity on each node, so it can make better placement decisions for Pods that need dynamically provisioned volumes.

The Problem Without Capacity Tracking

Consider a cluster with local SSDs on each node. Without capacity tracking:

  1. A Pod requests a 100GB PVC using a local StorageClass
  2. The scheduler picks a node based on CPU/memory — ignoring storage
  3. The CSI driver tries to provision 100GB on that node
  4. The node only has 50GB available — provisioning fails
  5. The Pod is stuck in Pending

With capacity tracking, the scheduler knows the node only has 50GB available and picks a different node.

How It Works

CSI drivers publish CSIStorageCapacity objects to report available capacity per node or topology segment:

apiVersion: storage.k8s.io/v1
kind: CSIStorageCapacity
metadata:
  name: csi-local-capacity-node-1
  namespace: kube-system
storageClassName: local-ssd
nodeTopology:
  matchLabels:
    kubernetes.io/hostname: node-1
capacity: 500Gi
maximumVolumeSize: 500Gi

The scheduler reads these objects during the filtering phase to exclude nodes without enough capacity.

Enabling Storage Capacity Tracking

1. CSI Driver Configuration

The CSI driver must be deployed with capacity tracking enabled:

# CSI driver deployment (example with external-provisioner sidecar)
containers:
  - name: csi-provisioner
    image: registry.k8s.io/sig-storage/csi-provisioner:v3.6.0
    args:
      - --enable-capacity
      - --capacity-ownerref-level=2
      - --capacity-poll-interval=30s

2. StorageClass Configuration

The StorageClass must use WaitForFirstConsumer binding mode:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-ssd
provisioner: local.csi.example.com
volumeBindingMode: WaitForFirstConsumer

WaitForFirstConsumer delays PVC binding until a Pod using it is scheduled. This is required because the scheduler needs to evaluate capacity per node before deciding placement.

CSIStorageCapacity Objects

The CSI driver's external-provisioner sidecar automatically creates and updates these objects:

# List all capacity objects
kubectl get csistoragecapacity -n kube-system

# Inspect capacity for a specific driver
kubectl get csistoragecapacity -n kube-system \
  -o custom-columns=NAME:.metadata.name,CLASS:.storageClassName,CAPACITY:.capacity,NODE:.nodeTopology.matchLabels

Example output:

NAME                          CLASS      CAPACITY  NODE
csi-local-cap-node-1         local-ssd  500Gi     kubernetes.io/hostname=node-1
csi-local-cap-node-2         local-ssd  200Gi     kubernetes.io/hostname=node-2
csi-local-cap-node-3         local-ssd  0         kubernetes.io/hostname=node-3

Node-3 has no available capacity, so the scheduler will not place Pods requiring local-ssd volumes there.

Scheduling Flow with Capacity Tracking

1. Pod references PVC with WaitForFirstConsumer StorageClass
2. Scheduler evaluates nodes:
   a. Filter nodes by CPU, memory, taints, affinity (standard)
   b. Filter nodes by CSIStorageCapacity (new step)
   c. Score remaining nodes
3. Scheduler binds Pod to a node
4. PVC is bound to that node's topology
5. CSI driver provisions the volume on that node

Use Cases

| Scenario | Why Capacity Tracking Helps | |----------|-----------------------------| | Local NVMe volumes | Prevents scheduling on nodes with full disks | | LVM-based local storage | Respects volume group free space | | Topology-constrained cloud storage | Avoids zones without available disk quotas | | HPE, NetApp, Pure local tiers | Respects array capacity limits |

Example: Local Volume with Capacity Tracking

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-nvme
provisioner: topolvm.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data-volume
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: local-nvme
  resources:
    requests:
      storage: 100Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: database
spec:
  containers:
    - name: postgres
      image: postgres:16
      volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
      resources:
        requests:
          cpu: "500m"
          memory: "1Gi"
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-volume

Debugging Capacity Issues

# Check if capacity objects exist
kubectl get csistoragecapacity -A

# Examine why a Pod is pending
kubectl describe pod database
# Look for: "node(s) did not have enough free storage"

# Check available capacity per node
kubectl get csistoragecapacity -n kube-system \
  --sort-by='.capacity' \
  -o custom-columns=NODE:.nodeTopology.matchLabels,CAP:.capacity

Why Interviewers Ask This

Without capacity tracking, Pods can be scheduled on nodes where local storage or topology-constrained volumes cannot be provisioned, leading to stuck Pods. This question tests advanced storage and scheduling knowledge.

Common Follow-Up Questions

What types of storage benefit from capacity tracking?
Local volumes, topology-constrained CSI volumes, and any storage that has per-node capacity limits. Network-attached storage with unlimited capacity typically does not need it.
How does the scheduler use CSIStorageCapacity objects?
The scheduler filters out nodes that lack sufficient capacity for the requested PVC. It only considers capacity tracking for StorageClasses that use WaitForFirstConsumer binding mode.
What happens if capacity tracking is not enabled?
The scheduler may place a Pod on a node where the CSI driver cannot provision storage. The Pod stays pending with a provisioning failure, requiring manual intervention.

Key Takeaways

  • Storage capacity tracking prevents scheduling Pods on nodes without sufficient storage capacity.
  • It works with CSIStorageCapacity objects published by CSI drivers.
  • Only relevant for WaitForFirstConsumer StorageClasses with topology constraints.

Related Questions

You Might Also Like