Kubernetes MemoryPressure
Causes and Fixes
MemoryPressure is a node condition that indicates the node is running low on available memory. When active, the kubelet begins evicting pods to reclaim memory, starting with BestEffort pods, then Burstable pods that exceed their requests. The node is also tainted to prevent new pods from being scheduled.
Symptoms
- kubectl describe node shows MemoryPressure condition as True
- Node has the taint node.kubernetes.io/memory-pressure:NoSchedule
- Pods are being evicted from the node
- kubectl top node shows high memory usage
- New pods cannot be scheduled to the affected node
Common Causes
Step-by-Step Troubleshooting
1. Identify Nodes with MemoryPressure
# Check node conditions
kubectl get nodes -o custom-columns='NAME:.metadata.name,STATUS:.status.conditions[?(@.type=="MemoryPressure")].status'
# Get detailed conditions
kubectl describe node <node-name> | grep -A5 "Conditions"
The condition will show:
MemoryPressure True KubeletHasInsufficientMemory ...
2. Check Node Memory Usage
# Check memory usage across nodes
kubectl top nodes
# Detailed memory info for the affected node
kubectl describe node <node-name> | grep -A15 "Allocated resources"
Look at the difference between allocatable memory, total requests, and actual usage.
3. Identify Memory-Hungry Pods
Find which pods are consuming the most memory on the node.
# List pods on the node sorted by memory
kubectl top pods -A --sort-by=memory | head -20
# Filter to specific node
kubectl get pods -A --field-selector spec.nodeName=<node-name> -o wide
kubectl top pods -A --sort-by=memory --no-headers | while read ns pod cpu mem; do
node=$(kubectl get pod "$pod" -n "$ns" -o jsonpath='{.spec.nodeName}' 2>/dev/null)
if [ "$node" = "<node-name>" ]; then
echo "$ns $pod $mem"
fi
done
4. Check for Pods Without Memory Limits
Pods without limits can consume unbounded memory.
# Find pods without memory limits on the node
kubectl get pods -A --field-selector spec.nodeName=<node-name> -o json | \
jq -r '.items[] | select(.spec.containers[].resources.limits.memory == null) | "\(.metadata.namespace)/\(.metadata.name)"'
5. Check QoS Classes of Pods on the Node
kubectl get pods -A --field-selector spec.nodeName=<node-name> -o json | \
jq -r '.items[] | "\(.metadata.namespace)/\(.metadata.name): \(.status.qosClass)"' | sort -t: -k2
BestEffort pods are evicted first, then Burstable, then Guaranteed. If all your critical pods are BestEffort, set proper resource requests.
6. Check Eviction Thresholds
# Check kubelet configuration for eviction thresholds
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz 2>/dev/null | jq '.kubeletconfig | {evictionHard, evictionSoft}'
Default hard eviction thresholds:
memory.available < 100Mi
Default soft eviction thresholds (if configured):
memory.available < 300Mi(with grace period)
7. Check System Reserved Memory
# Check reserved resources
kubectl get --raw /api/v1/nodes/<node-name>/proxy/configz 2>/dev/null | jq '.kubeletconfig | {systemReserved, kubeReserved}'
If system-reserved is not set, system processes compete with pods for memory. Configure it in kubelet:
# kubelet configuration
systemReserved:
memory: "512Mi"
kubeReserved:
memory: "256Mi"
evictionHard:
memory.available: "200Mi"
8. Immediate Relief: Reduce Memory Usage
If the node is actively under pressure, take immediate action.
# Delete non-critical pods
kubectl delete pod <non-critical-pod> -n <namespace>
# Scale down deployments on the node
kubectl scale deployment <deploy-name> --replicas=<fewer>
# Cordon the node to prevent new scheduling
kubectl cordon <node-name>
9. Long-Term Fix: Set Memory Limits
Enforce memory limits across all namespaces with LimitRanges.
apiVersion: v1
kind: LimitRange
metadata:
name: memory-defaults
namespace: <namespace>
spec:
limits:
- default:
memory: "512Mi"
defaultRequest:
memory: "256Mi"
max:
memory: "2Gi"
min:
memory: "64Mi"
type: Container
kubectl apply -f limitrange.yaml
10. Scale the Cluster
If the cluster genuinely needs more memory capacity:
# Check Cluster Autoscaler status
kubectl get pods -n kube-system -l app=cluster-autoscaler
# Manually scale node group (cloud-specific)
# AWS EKS
eksctl scale nodegroup --cluster=<cluster> --name=<ng> --nodes=5
# GKE
gcloud container clusters resize <cluster> --node-pool=<pool> --num-nodes=5
11. Verify Resolution
# Check that MemoryPressure is cleared
kubectl describe node <node-name> | grep MemoryPressure
# Check the taint is removed
kubectl describe node <node-name> | grep memory-pressure
# Uncordon if previously cordoned
kubectl uncordon <node-name>
# Verify pods are running
kubectl get pods -A --field-selector spec.nodeName=<node-name>
The MemoryPressure condition should transition to False and the taint should be removed automatically once available memory rises above the threshold.
How to Explain This in an Interview
I would explain that MemoryPressure is a node condition, not a pod error. The kubelet monitors available memory and triggers this condition when it falls below the eviction threshold (default: 100Mi available). I would describe the eviction order based on QoS classes and how the kubelet uses soft and hard eviction thresholds. I would also discuss the difference between MemoryPressure (node-level) and OOMKilled (container-level), and how proper resource management with requests, limits, and LimitRanges prevents both.
Prevention
- Set appropriate kube-reserved and system-reserved on each node
- Enforce memory limits on all pods with LimitRanges
- Monitor node memory usage and alert before pressure conditions
- Use Cluster Autoscaler to add nodes when capacity is low
- Right-size pods with VPA recommendations
- Set memory requests equal to limits for critical workloads (Guaranteed QoS)