Kubernetes DiskPressure

Causes and Fixes

DiskPressure is a node condition that indicates the node is running low on available disk space. When active, the kubelet stops accepting new pods, garbage collects unused images and dead containers, and may evict pods to reclaim disk space. This condition affects both the root filesystem and the container image filesystem.

Symptoms

  • kubectl describe node shows DiskPressure condition as True
  • Node has the taint node.kubernetes.io/disk-pressure:NoSchedule
  • New pods cannot be scheduled to the affected node
  • Pod events show eviction due to disk pressure
  • Container image pulls may fail on the affected node
  • kubectl get events shows 'NodeHasDiskPressure' warnings

Common Causes

1
Container logs filling disk
Containers writing excessive logs to stdout/stderr fill up the node's disk. The kubelet stores these logs on the node filesystem.
2
Unused container images
Accumulated old container images from frequent deployments consume disk space. Garbage collection thresholds may be too lenient.
3
emptyDir volumes consuming disk
Pods using emptyDir volumes without sizeLimit can consume unbounded disk space on the node.
4
Local persistent volumes
Applications using local storage (hostPath or local PVs) fill up the node's disk.
5
Large container writable layers
Containers writing large files to the writable layer (not a volume) consume disk space in the container runtime's storage.
6
Node disk too small
The node's disk is undersized for the workload. Container images, logs, and emptyDir volumes all compete for space.

Step-by-Step Troubleshooting

1. Identify Nodes with DiskPressure

# Check all node conditions
kubectl get nodes -o custom-columns='NAME:.metadata.name,DISK_PRESSURE:.status.conditions[?(@.type=="DiskPressure")].status'

# Get details
kubectl describe node <node-name> | grep -A5 DiskPressure

2. Check Disk Usage on the Node

# Overall disk usage
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "df -h"

# Check specific directories
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "du -sh /var/lib/containerd /var/log/pods /var/lib/kubelet 2>/dev/null"

Key directories:

  • /var/lib/containerd or /var/lib/docker — Container images and writable layers
  • /var/log/pods — Container log files
  • /var/lib/kubelet — Kubelet data, emptyDir volumes

3. Check Container Log Sizes

Container logs are often the biggest disk consumer.

# Find the largest log files
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "find /var/log/pods -name '*.log' -exec ls -lhS {} + | head -20"

# Total log size
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "du -sh /var/log/pods"

4. Check Container Image Disk Usage

# List images and their sizes
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl images --no-trunc"

# Check total image storage
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "du -sh /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"

5. Clean Up Unused Images

Trigger garbage collection of unused images.

# Remove unused images
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rmi --prune"

# List and remove specific large unused images
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl images | sort -k3 -h -r | head -20"

6. Clean Up Dead Containers

# Remove stopped containers
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rm \$(crictl ps -a -q --state exited)"

7. Delete Evicted Pods

Evicted pods leave behind residual data. Clean them up.

# Delete all evicted pods
kubectl get pods -A --field-selector=status.phase=Failed -o json | \
  jq -r '.items[] | select(.status.reason=="Evicted") | "\(.metadata.namespace) \(.metadata.name)"' | \
  while read ns name; do kubectl delete pod "$name" -n "$ns"; done

8. Set Log Rotation

Configure log rotation to prevent logs from consuming all disk space.

Kubelet log rotation settings (kubelet config):

containerLogMaxSize: "50Mi"
containerLogMaxFiles: 3

This limits each container log to 50Mi with 3 rotated files, capping total log usage per container at 150Mi.

9. Set emptyDir Size Limits

Prevent pods from using unlimited disk via emptyDir volumes.

volumes:
  - name: cache
    emptyDir:
      sizeLimit: "500Mi"

When the emptyDir exceeds the size limit, the pod is evicted. This protects the node from individual pods consuming all disk space.

10. Set Ephemeral Storage Requests and Limits

Kubernetes can track and limit total ephemeral storage usage per pod (logs + emptyDir + writable layer).

resources:
  requests:
    ephemeral-storage: "1Gi"
  limits:
    ephemeral-storage: "2Gi"

When the pod exceeds the ephemeral storage limit, it is evicted.

11. Configure Image Garbage Collection

Adjust the kubelet's image garbage collection thresholds.

# kubelet configuration
imageGCHighThresholdPercent: 85  # Start GC when disk is 85% full
imageGCLowThresholdPercent: 80   # Stop GC when disk drops to 80%

More aggressive settings for small disks:

imageGCHighThresholdPercent: 70
imageGCLowThresholdPercent: 60

12. Resize the Node Disk

If the node disk is too small for the workload, resize it.

# AWS: Resize the EBS volume
aws ec2 modify-volume --volume-id <vol-id> --size 200

# Then extend the filesystem on the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "growpart /dev/xvda 1 && resize2fs /dev/xvda1"

For managed Kubernetes services, update the node group configuration to use larger disks and replace nodes.

13. Verify Resolution

# Check DiskPressure is cleared
kubectl describe node <node-name> | grep DiskPressure
# Should show: DiskPressure  False

# Check taint is removed
kubectl describe node <node-name> | grep disk-pressure
# Should be empty

# Verify disk space
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "df -h"

# Verify new pods can be scheduled
kubectl run test --image=busybox --restart=Never --command -- sleep 10
kubectl get pod test -o wide
kubectl delete pod test

The DiskPressure condition should clear automatically once available disk space rises above the eviction threshold.

How to Explain This in an Interview

I would explain that DiskPressure is monitored by the kubelet against two filesystems: nodefs (the node's root filesystem where kubelet stores logs and local data) and imagefs (where the container runtime stores images and writable layers). Default thresholds are nodefs.available < 10% and imagefs.available < 15%. I would describe the garbage collection mechanism (images are collected when disk usage exceeds the high threshold, starting with least recently used images), and how to prevent disk pressure by limiting log sizes, setting emptyDir sizeLimit, and using appropriate node disk sizes.

Prevention

  • Set log rotation policies and maximum log file sizes
  • Set sizeLimit on emptyDir volumes
  • Configure image garbage collection thresholds appropriately
  • Use ephemeral storage requests and limits on pods
  • Monitor node disk usage and alert at 70% utilization
  • Size node disks appropriately for the workload

Related Errors