Kubernetes FailedScheduling

Causes and Fixes

FailedScheduling is a pod event indicating the Kubernetes scheduler could not find a suitable node to place the pod. The pod remains in Pending state until the scheduling constraints can be satisfied. This is one of the most common reasons pods fail to start.

Symptoms

  • Pod status shows Pending indefinitely
  • Pod events show 'FailedScheduling' with a reason message
  • Events list specific constraint violations like 'Insufficient cpu' or 'node(s) didn't match'
  • Multiple pods pile up in Pending state
  • kubectl describe pod shows 0/N nodes are available

Common Causes

1
Insufficient CPU or memory resources
No node has enough allocatable CPU or memory to satisfy the pod's resource requests. All nodes are at capacity or the requests are larger than any single node can provide.
2
Node affinity or anti-affinity not satisfiable
The pod's nodeAffinity or podAffinity/podAntiAffinity rules cannot be satisfied by any available node.
3
Taints with no matching tolerations
All suitable nodes have taints (like node-role.kubernetes.io/control-plane) that the pod does not tolerate.
4
No nodes match nodeSelector
The pod specifies a nodeSelector with labels that no node in the cluster has.
5
PersistentVolumeClaim not bound
The pod references a PVC that is still in Pending state, and the scheduler cannot place the pod until the volume is available.
6
Topology spread constraints unsatisfiable
The pod's topologySpreadConstraints cannot be satisfied while maintaining the required spread across zones or nodes.

Step-by-Step Troubleshooting

When a pod is stuck in Pending with FailedScheduling, the scheduler is telling you exactly what constraints cannot be met. This guide walks through reading the scheduler's message and resolving each type of scheduling failure.

1. Read the Scheduler's Error Message

The scheduler provides detailed reasons for why no node was suitable.

kubectl describe pod <pod-name>

Look at the Events section for the FailedScheduling event. The message is highly specific, for example:

0/5 nodes are available: 2 Insufficient cpu, 1 Insufficient memory,
2 node(s) had taint {node-role.kubernetes.io/control-plane: },
that the pod didn't tolerate.

This tells you exactly how many nodes failed each constraint. Parse it carefully — it accounts for every node in the cluster.

2. Check Cluster Resource Availability

If the message mentions "Insufficient cpu" or "Insufficient memory":

# Check node resource allocation
kubectl describe nodes | grep -A6 "Allocated resources"

# Get a summary of allocatable vs requested resources
kubectl top nodes

# Check specific node capacity and allocatable
kubectl get node <node-name> -o jsonpath='{.status.allocatable}' | jq .

# Check the pod's resource requests
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].resources.requests}' | jq .

Compare the pod's requests against available capacity on each node. Remember that allocatable resources = capacity minus system reserved minus kube reserved.

# Find nodes with the most available resources
kubectl describe nodes | grep -B3 "Allocated resources" | grep -E "Name:|cpu|memory"

3. Resolve Resource Shortages

Several options exist for resource-related scheduling failures.

# Option 1: Reduce pod resource requests if they are over-provisioned
kubectl set resources deployment <deployment-name> --requests=cpu=100m,memory=128Mi

# Option 2: Scale down other workloads to free resources
kubectl scale deployment <low-priority-deployment> --replicas=0

# Option 3: Add more nodes to the cluster
# (cloud-specific: use console, CLI, or cluster autoscaler)

# Option 4: Use pod priority to preempt lower priority pods
kubectl get priorityclass

4. Check Node Selectors

If the message mentions "node(s) didn't match Pod's node selector":

# Check the pod's nodeSelector
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeSelector}' | jq .

# List nodes with their labels
kubectl get nodes --show-labels

# Check if any node has the required label
kubectl get nodes -l <key>=<value>

If no nodes match, either add the label to appropriate nodes or update the pod's nodeSelector.

# Add a label to a node
kubectl label node <node-name> <key>=<value>

# Or remove the nodeSelector from the deployment
kubectl patch deployment <deployment-name> --type=json -p='[{"op":"remove","path":"/spec/template/spec/nodeSelector"}]'

5. Check Taints and Tolerations

If the message mentions "had taint ... that the pod didn't tolerate":

# List all node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

# Check specific node taints
kubectl describe node <node-name> | grep -A5 Taints

Common taints include:

  • node-role.kubernetes.io/control-plane:NoSchedule — control plane nodes
  • node.kubernetes.io/not-ready:NoSchedule — node is not ready
  • node.kubernetes.io/disk-pressure:NoSchedule — disk pressure
  • Custom taints for dedicated node pools
# Add a toleration to the deployment
kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "tolerations": [
          {
            "key": "<taint-key>",
            "operator": "Equal",
            "value": "<taint-value>",
            "effect": "NoSchedule"
          }
        ]
      }
    }
  }
}'

6. Check Node Affinity Rules

If affinity rules are preventing scheduling:

# Check the pod's affinity configuration
kubectl get pod <pod-name> -o yaml | grep -A30 "affinity:"

Node affinity can be Required (hard constraint) or Preferred (soft constraint). Required rules that cannot be satisfied will block scheduling.

# Check which nodes match the affinity rules
kubectl get nodes -l <affinity-label-key>=<affinity-label-value>

# If no nodes match, add the label or relax the affinity
kubectl label node <node-name> <key>=<value>

7. Check Pod Anti-Affinity

Pod anti-affinity rules prevent pods from being co-located.

# Check anti-affinity configuration
kubectl get pod <pod-name> -o yaml | grep -A20 "podAntiAffinity:"

If required anti-affinity spreads pods across nodes and there are not enough nodes, scheduling will fail. Either add more nodes or change from Required to Preferred anti-affinity.

8. Check Volume Topology Constraints

If the message mentions PVC-related issues:

# Check if the pod's PVCs are bound
kubectl get pvc | grep <pod-related-pvcs>

# Check volume topology
kubectl get pv <pv-name> -o jsonpath='{.spec.nodeAffinity}'

If a PVC is bound to a PV in a specific zone, the pod can only be scheduled to nodes in that zone.

9. Check Topology Spread Constraints

# Check the pod's topology spread constraints
kubectl get pod <pod-name> -o yaml | grep -A10 "topologySpreadConstraints:"

If the maximum skew would be violated, the scheduler rejects the placement. Consider adjusting the maxSkew or using WhenUnsatisfiable: ScheduleAnyway instead of DoNotSchedule.

10. Verify the Pod Is Scheduled

After resolving the constraint, verify the pod gets scheduled.

# Watch the pod status
kubectl get pod <pod-name> -w

# Check which node it was scheduled to
kubectl get pod <pod-name> -o wide

# Verify no more FailedScheduling events
kubectl describe pod <pod-name> | tail -20

The pod should transition from Pending to ContainerCreating to Running. If it remains Pending with a new FailedScheduling reason, there may be multiple constraints to resolve — read the updated error message and address the next constraint.

How to Explain This in an Interview

I would explain the Kubernetes scheduler's workflow: it filters nodes that meet all hard constraints (resource requirements, nodeSelector, taints/tolerations, affinity, volume topology), then scores the remaining nodes to find the best fit. FailedScheduling means the filter phase eliminated all nodes. I'd walk through reading the scheduler's reason message, which lists exactly which constraints each node failed, and discuss common scenarios: resource pressure, misconfigured affinity rules, forgotten taint tolerations, and volume topology issues. I'd also cover the difference between hard (Required) and soft (Preferred) scheduling constraints, and how the cluster autoscaler interacts with scheduling.

Prevention

  • Set realistic resource requests based on actual usage data
  • Monitor cluster resource utilization and capacity
  • Use pod priorities and preemption for critical workloads
  • Test scheduling constraints in staging before production
  • Configure cluster autoscaler to add nodes when capacity is exhausted

Related Errors