Kubernetes MemoryPressure

Causes and Fixes

MemoryPressure is a node condition that indicates the node is running low on available memory. When active, the kubelet begins evicting pods to reclaim memory, starting with BestEffort pods, then Burstable pods that exceed their requests. The node is also tainted to prevent new pods from being scheduled.

Symptoms

  • kubectl describe node shows MemoryPressure condition as True
  • Node has the taint node.kubernetes.io/memory-pressure:NoSchedule
  • Pods are being evicted from the node
  • kubectl top node shows high memory usage
  • New pods cannot be scheduled to the affected node

Common Causes

1
Overcommitted memory on the node
Total memory requests of all pods exceed the node's allocatable memory, and actual usage has caught up. Reduce requests or add capacity.
2
Pod without memory limits consuming too much
A pod without memory limits is using unbounded memory, starving other pods and the system. Use LimitRanges to enforce defaults.
3
Memory leak in an application
An application gradually consumes more memory until the node is under pressure. Identify the leaking pod with kubectl top.
4
System processes consuming memory
Kubelet, container runtime, or other system processes consume more memory than reserved, leaving less for pods.
5
Insufficient kube-reserved or system-reserved
Not enough memory is reserved for kubelet and system daemons, so pod allocations use memory that system processes need.

Step-by-Step Troubleshooting

1. Identify Nodes with MemoryPressure

# Check node conditions
kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[?(@.type=="MemoryPressure")].status'

# Get detailed conditions
kubectl describe node <node-name> | grep -A5 "Conditions"

The condition will show:

MemoryPressure   True   KubeletHasInsufficientMemory   ...

2. Check Node Memory Usage

# Check memory usage across nodes
kubectl top nodes

# Detailed memory info for the affected node
kubectl describe node <node-name> | grep -A15 "Allocated resources"

Look at the difference between allocatable memory, total requests, and actual usage.

3. Identify Memory-Hungry Pods

Find which pods are consuming the most memory on the node.

# List pods on the node sorted by memory
kubectl top pods -A --sort-by=memory | head -20

# Filter to specific node
kubectl get pods -A --field-selector spec.nodeName=<node-name> -o wide
kubectl top pods -A --sort-by=memory --no-headers | while read ns pod cpu mem; do
  node=$(kubectl get pod "$pod" -n "$ns" -o jsonpath='{.spec.nodeName}' 2>/dev/null)
  if [ "$node" = "<node-name>" ]; then
    echo "$ns $pod $mem"
  fi
done

4. Check for Pods Without Memory Limits

Pods without limits can consume unbounded memory.

# Find pods without memory limits on the node
kubectl get pods -A --field-selector spec.nodeName=<node-name> -o json | \
  jq -r '.items[] | select(.spec.containers[].resources.limits.memory == null) | "\(.metadata.namespace)/\(.metadata.name)"'

5. Check QoS Classes of Pods on the Node

kubectl get pods -A --field-selector spec.nodeName=<node-name> -o json | \
  jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name): \(.status.qosClass)"' | sort -t: -k2

BestEffort pods are evicted first, then Burstable, then Guaranteed. If all your critical pods are BestEffort, set proper resource requests.

6. Check Eviction Thresholds

# Check kubelet configuration for eviction thresholds
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz 2>/dev/null | jq '.kubeletconfig | {evictionHard, evictionSoft}'

Default hard eviction thresholds:

  • memory.available < 100Mi

Default soft eviction thresholds (if configured):

  • memory.available < 300Mi (with grace period)

7. Check System Reserved Memory

# Check reserved resources
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz 2>/dev/null | jq '.kubeletconfig | {systemReserved, kubeReserved}'

If system-reserved is not set, system processes compete with pods for memory. Configure it in kubelet:

# kubelet configuration
systemReserved:
  memory: "512Mi"
kubeReserved:
  memory: "256Mi"
evictionHard:
  memory.available: "200Mi"

8. Immediate Relief: Reduce Memory Usage

If the node is actively under pressure, take immediate action.

# Delete non-critical pods
kubectl delete pod <non-critical-pod> -n <namespace>

# Scale down deployments on the node
kubectl scale deployment <deploy-name> --replicas=<fewer>

# Cordon the node to prevent new scheduling
kubectl cordon <node-name>

9. Long-Term Fix: Set Memory Limits

Enforce memory limits across all namespaces with LimitRanges.

apiVersion: v1
kind: LimitRange
metadata:
  name: memory-defaults
  namespace: <namespace>
spec:
  limits:
    - default:
        memory: "512Mi"
      defaultRequest:
        memory: "256Mi"
      max:
        memory: "2Gi"
      min:
        memory: "64Mi"
      type: Container
kubectl apply -f limitrange.yaml

10. Scale the Cluster

If the cluster genuinely needs more memory capacity:

# Check Cluster Autoscaler status
kubectl get pods -n kube-system -l app=cluster-autoscaler

# Manually scale node group (cloud-specific)
# AWS EKS
eksctl scale nodegroup --cluster=<cluster> --name=<ng> --nodes=5

# GKE
gcloud container clusters resize <cluster> --node-pool=<pool> --num-nodes=5

11. Verify Resolution

# Check that MemoryPressure is cleared
kubectl describe node <node-name> | grep MemoryPressure

# Check the taint is removed
kubectl describe node <node-name> | grep memory-pressure

# Uncordon if previously cordoned
kubectl uncordon <node-name>

# Verify pods are running
kubectl get pods -A --field-selector spec.nodeName=<node-name>

The MemoryPressure condition should transition to False and the taint should be removed automatically once available memory rises above the threshold.

How to Explain This in an Interview

I would explain that MemoryPressure is a node condition, not a pod error. The kubelet monitors available memory and triggers this condition when it falls below the eviction threshold (default: 100Mi available). I would describe the eviction order based on QoS classes and how the kubelet uses soft and hard eviction thresholds. I would also discuss the difference between MemoryPressure (node-level) and OOMKilled (container-level), and how proper resource management with requests, limits, and LimitRanges prevents both.

Prevention

  • Set appropriate kube-reserved and system-reserved on each node
  • Enforce memory limits on all pods with LimitRanges
  • Monitor node memory usage and alert before pressure conditions
  • Use Cluster Autoscaler to add nodes when capacity is low
  • Right-size pods with VPA recommendations
  • Set memory requests equal to limits for critical workloads (Guaranteed QoS)

Related Errors