How Do Taints and Tolerations Work in Kubernetes?

Q: How Do Taints and Tolerations Work in Kubernetes?

Taints are applied to nodes to repel Pods that do not tolerate them. Tolerations are applied to Pods to allow scheduling onto tainted nodes. Together they ensure only specific workloads run on designated nodes, such as GPU nodes, dedicated tenant nodes, or control plane nodes.

Detailed Answer

Taints: Repelling Pods from Nodes

A taint marks a node so that no Pod will be scheduled on it unless the Pod explicitly tolerates the taint. Taints have three components: a key, a value, and an effect.

# Add a taint to a node
kubectl taint nodes worker-1 dedicated=gpu:NoSchedule

# View taints on a node
kubectl describe node worker-1 | grep -A 5 Taints

# Remove a taint (note the trailing minus)
kubectl taint nodes worker-1 dedicated=gpu:NoSchedule-

The Three Taint Effects

NoSchedule

New Pods without a matching toleration will not be scheduled on the node. Existing Pods are not affected.

kubectl taint nodes worker-1 environment=production:NoSchedule

PreferNoSchedule

The scheduler tries to avoid placing Pods on the node, but will do so if no other nodes are available. It is a soft version of NoSchedule.

kubectl taint nodes worker-2 environment=staging:PreferNoSchedule

NoExecute

New Pods are not scheduled, and existing Pods without a matching toleration are evicted immediately. This is the strongest effect.

kubectl taint nodes worker-3 maintenance=true:NoExecute

Tolerations: Allowing Pods onto Tainted Nodes

Tolerations are specified in the Pod spec. A toleration "matches" a taint when the key, value, and effect align.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-workload
spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "gpu"
      effect: "NoSchedule"
  containers:
    - name: ml-training
      image: tensorflow/tensorflow:latest-gpu
      resources:
        limits:
          nvidia.com/gpu: 1

Toleration Operators

Equal (default)

The key, value, and effect must all match:

tolerations:
  - key: "dedicated"
    operator: "Equal"
    value: "gpu"
    effect: "NoSchedule"

Exists

Only the key and effect must match. The value is ignored:

tolerations:
  - key: "dedicated"
    operator: "Exists"
    effect: "NoSchedule"

Tolerate All Taints

An empty toleration with Exists matches everything. Use this for DaemonSets that must run everywhere:

tolerations:
  - operator: "Exists"

tolerationSeconds with NoExecute

When a NoExecute taint is applied and a Pod has a matching toleration, you can control how long the Pod stays before eviction:

tolerations:
  - key: "node.kubernetes.io/not-ready"
    operator: "Exists"
    effect: "NoExecute"
    tolerationSeconds: 300  # Stay for 5 minutes after node becomes NotReady

Without tolerationSeconds, the Pod tolerates the taint indefinitely. With it, the Pod is evicted after the specified duration.

Built-in Taints

Kubernetes automatically applies taints based on node conditions:

| Taint | Condition | |---|---| | node.kubernetes.io/not-ready | Node is not ready | | node.kubernetes.io/unreachable | Node controller cannot reach the node | | node.kubernetes.io/memory-pressure | Node has memory pressure | | node.kubernetes.io/disk-pressure | Node has disk pressure | | node.kubernetes.io/pid-pressure | Node has too many processes | | node.kubernetes.io/unschedulable | Node is cordoned |

The default tolerations for these are added to every Pod by the DefaultTolerationSeconds admission controller, typically with a 300-second grace period.

Practical Examples

Dedicated GPU Nodes

# Taint the GPU nodes
kubectl taint nodes gpu-node-1 gpu-node-2 accelerator=nvidia:NoSchedule

# Only GPU workloads can be scheduled on GPU nodes
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ml-inference
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ml-inference
  template:
    metadata:
      labels:
        app: ml-inference
    spec:
      tolerations:
        - key: "accelerator"
          operator: "Equal"
          value: "nvidia"
          effect: "NoSchedule"
      nodeSelector:
        accelerator: nvidia
      containers:
        - name: model
          image: ml-model:v1
          resources:
            limits:
              nvidia.com/gpu: 1

Note: Tolerations alone do not attract Pods to specific nodes. You need nodeSelector or node affinity in addition to ensure the Pod is scheduled on the GPU node, not just allowed there.

Node Maintenance

# Taint the node for maintenance - existing Pods will be evicted
kubectl taint nodes worker-1 maintenance=planned:NoExecute

# Perform maintenance...

# Remove the taint when done
kubectl taint nodes worker-1 maintenance=planned:NoExecute-

Control Plane Isolation

By default, control plane nodes have this taint:

node-role.kubernetes.io/control-plane:NoSchedule

This prevents regular workloads from running on control plane nodes. To schedule a Pod on the control plane (e.g., for single-node clusters):

tolerations:
  - key: "node-role.kubernetes.io/control-plane"
    operator: "Exists"
    effect: "NoSchedule"

Combining Taints with Node Affinity

Taints repel unwanted Pods; node affinity attracts wanted Pods. Use both together for full workload isolation:

spec:
  tolerations:
    - key: "dedicated"
      operator: "Equal"
      value: "team-a"
      effect: "NoSchedule"
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: dedicated
                operator: In
                values:
                  - team-a

This ensures the Pod both tolerates the taint and is attracted to the correct nodes.