How Do Taints and Tolerations Work in Kubernetes?
Taints are applied to nodes to repel Pods that do not tolerate them. Tolerations are applied to Pods to allow scheduling onto tainted nodes. Together they ensure only specific workloads run on designated nodes, such as GPU nodes, dedicated tenant nodes, or control plane nodes.
Detailed Answer
Taints: Repelling Pods from Nodes
A taint marks a node so that no Pod will be scheduled on it unless the Pod explicitly tolerates the taint. Taints have three components: a key, a value, and an effect.
# Add a taint to a node
kubectl taint nodes worker-1 dedicated=gpu:NoSchedule
# View taints on a node
kubectl describe node worker-1 | grep -A 5 Taints
# Remove a taint (note the trailing minus)
kubectl taint nodes worker-1 dedicated=gpu:NoSchedule-
The Three Taint Effects
NoSchedule
New Pods without a matching toleration will not be scheduled on the node. Existing Pods are not affected.
kubectl taint nodes worker-1 environment=production:NoSchedule
PreferNoSchedule
The scheduler tries to avoid placing Pods on the node, but will do so if no other nodes are available. It is a soft version of NoSchedule.
kubectl taint nodes worker-2 environment=staging:PreferNoSchedule
NoExecute
New Pods are not scheduled, and existing Pods without a matching toleration are evicted immediately. This is the strongest effect.
kubectl taint nodes worker-3 maintenance=true:NoExecute
Tolerations: Allowing Pods onto Tainted Nodes
Tolerations are specified in the Pod spec. A toleration "matches" a taint when the key, value, and effect align.
apiVersion: v1
kind: Pod
metadata:
name: gpu-workload
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
containers:
- name: ml-training
image: tensorflow/tensorflow:latest-gpu
resources:
limits:
nvidia.com/gpu: 1
Toleration Operators
Equal (default)
The key, value, and effect must all match:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "gpu"
effect: "NoSchedule"
Exists
Only the key and effect must match. The value is ignored:
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
Tolerate All Taints
An empty toleration with Exists matches everything. Use this for DaemonSets that must run everywhere:
tolerations:
- operator: "Exists"
tolerationSeconds with NoExecute
When a NoExecute taint is applied and a Pod has a matching toleration, you can control how long the Pod stays before eviction:
tolerations:
- key: "node.kubernetes.io/not-ready"
operator: "Exists"
effect: "NoExecute"
tolerationSeconds: 300 # Stay for 5 minutes after node becomes NotReady
Without tolerationSeconds, the Pod tolerates the taint indefinitely. With it, the Pod is evicted after the specified duration.
Built-in Taints
Kubernetes automatically applies taints based on node conditions:
| Taint | Condition |
|---|---|
| node.kubernetes.io/not-ready | Node is not ready |
| node.kubernetes.io/unreachable | Node controller cannot reach the node |
| node.kubernetes.io/memory-pressure | Node has memory pressure |
| node.kubernetes.io/disk-pressure | Node has disk pressure |
| node.kubernetes.io/pid-pressure | Node has too many processes |
| node.kubernetes.io/unschedulable | Node is cordoned |
The default tolerations for these are added to every Pod by the DefaultTolerationSeconds admission controller, typically with a 300-second grace period.
Practical Examples
Dedicated GPU Nodes
# Taint the GPU nodes
kubectl taint nodes gpu-node-1 gpu-node-2 accelerator=nvidia:NoSchedule
# Only GPU workloads can be scheduled on GPU nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: ml-inference
spec:
replicas: 3
selector:
matchLabels:
app: ml-inference
template:
metadata:
labels:
app: ml-inference
spec:
tolerations:
- key: "accelerator"
operator: "Equal"
value: "nvidia"
effect: "NoSchedule"
nodeSelector:
accelerator: nvidia
containers:
- name: model
image: ml-model:v1
resources:
limits:
nvidia.com/gpu: 1
Note: Tolerations alone do not attract Pods to specific nodes. You need nodeSelector or node affinity in addition to ensure the Pod is scheduled on the GPU node, not just allowed there.
Node Maintenance
# Taint the node for maintenance - existing Pods will be evicted
kubectl taint nodes worker-1 maintenance=planned:NoExecute
# Perform maintenance...
# Remove the taint when done
kubectl taint nodes worker-1 maintenance=planned:NoExecute-
Control Plane Isolation
By default, control plane nodes have this taint:
node-role.kubernetes.io/control-plane:NoSchedule
This prevents regular workloads from running on control plane nodes. To schedule a Pod on the control plane (e.g., for single-node clusters):
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
Combining Taints with Node Affinity
Taints repel unwanted Pods; node affinity attracts wanted Pods. Use both together for full workload isolation:
spec:
tolerations:
- key: "dedicated"
operator: "Equal"
value: "team-a"
effect: "NoSchedule"
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- team-a
This ensures the Pod both tolerates the taint and is attracted to the correct nodes.
Why Interviewers Ask This
Interviewers ask this to test your understanding of advanced scheduling mechanisms and your ability to isolate workloads on specific nodes.
Common Follow-Up Questions
Key Takeaways
- Taints go on nodes; tolerations go on Pods
- NoExecute evicts running Pods; NoSchedule only affects new scheduling
- Kubernetes auto-taints nodes based on conditions like NotReady