How Does Node Affinity Work in Kubernetes?

Q: How Does Node Affinity Work in Kubernetes?

Node affinity is an advanced scheduling mechanism that attracts Pods to nodes based on label expressions. It comes in two forms: requiredDuringSchedulingIgnoredDuringExecution (hard requirement) and preferredDuringSchedulingIgnoredDuringExecution (soft preference). It replaces nodeSelector with more expressive matching capabilities.

Detailed Answer

nodeSelector vs. Node Affinity

The simplest way to constrain Pods to specific nodes is nodeSelector:

spec:
  nodeSelector:
    disktype: ssd
    region: us-east-1

This requires an exact label match and supports only equality. Node affinity extends this with richer operators, soft preferences, and weighted scoring.

Required Node Affinity (Hard Constraint)

requiredDuringSchedulingIgnoredDuringExecution is a hard requirement. The Pod will only be scheduled on nodes that match the expressions. If no nodes match, the Pod stays Pending.

apiVersion: v1
kind: Pod
metadata:
  name: gpu-app
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: accelerator
                operator: In
                values:
                  - nvidia-tesla-v100
                  - nvidia-a100
              - key: topology.kubernetes.io/zone
                operator: In
                values:
                  - us-east-1a
                  - us-east-1b
  containers:
    - name: gpu-app
      image: ml-model:latest

In this example:

The expressions within the single term are ANDed: the node must have both the accelerator label (matching one of the listed GPUs) AND be in one of the listed zones.

Multiple Selector Terms (OR Logic)

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:  # Term 1
            - key: accelerator
              operator: In
              values:
                - nvidia-a100
        - matchExpressions:  # Term 2
            - key: accelerator
              operator: In
              values:
                - nvidia-h100

Multiple terms in nodeSelectorTerms are ORed: the node must match Term 1 OR Term 2.

Preferred Node Affinity (Soft Constraint)

preferredDuringSchedulingIgnoredDuringExecution tells the scheduler to prefer certain nodes but does not make it mandatory. Each preference has a weight (1-100) that influences scheduling decisions.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 6
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 80
              preference:
                matchExpressions:
                  - key: node-type
                    operator: In
                    values:
                      - compute-optimized
            - weight: 20
              preference:
                matchExpressions:
                  - key: topology.kubernetes.io/zone
                    operator: In
                    values:
                      - us-east-1a
      containers:
        - name: web
          image: nginx:1.25

The scheduler adds scores based on weights. In this example, compute-optimized nodes get 80 points and nodes in us-east-1a get 20 points. A node matching both gets 100 points. The scheduler picks the highest-scoring node.

Operators

| Operator | Meaning | |---|---| | In | Label value is in the list | | NotIn | Label value is not in the list | | Exists | Label key exists (value ignored) | | DoesNotExist | Label key does not exist | | Gt | Label value is greater than (numeric comparison) | | Lt | Label value is less than (numeric comparison) |

# Example: Schedule on nodes with at least 8 GPUs
matchExpressions:
  - key: gpu-count
    operator: Gt
    values:
      - "7"  # Must be a string, compared as integer

Combining Required and Preferred

You can use both simultaneously. The required affinity filters eligible nodes, and the preferred affinity ranks them:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/arch
              operator: In
              values:
                - amd64
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        preference:
          matchExpressions:
            - key: node-type
              operator: In
              values:
                - high-memory

This means: the Pod MUST run on amd64 architecture, and PREFERS high-memory nodes.

Practical Use Cases

Zone-Aware Scheduling

# Require Pods to run in specific availability zones
requiredDuringSchedulingIgnoredDuringExecution:
  nodeSelectorTerms:
    - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - us-east-1a
            - us-east-1b

Architecture-Specific Workloads

# Run on ARM64 nodes
requiredDuringSchedulingIgnoredDuringExecution:
  nodeSelectorTerms:
    - matchExpressions:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64

Cost Optimization with Spot Instances

# Prefer spot instances but fall back to on-demand
preferredDuringSchedulingIgnoredDuringExecution:
  - weight: 90
    preference:
      matchExpressions:
        - key: node-lifecycle
          operator: In
          values:
            - spot
  - weight: 10
    preference:
      matchExpressions:
        - key: node-lifecycle
          operator: In
          values:
            - on-demand

Verifying Node Labels

# List all node labels
kubectl get nodes --show-labels

# Check labels on a specific node
kubectl describe node worker-1 | grep -A 20 Labels

# Add a label to a node
kubectl label nodes worker-1 node-type=compute-optimized

# Remove a label
kubectl label nodes worker-1 node-type-

# Check why a Pod is Pending (scheduling failure)
kubectl describe pod gpu-app | grep -A 10 Events

IgnoredDuringExecution

The "IgnoredDuringExecution" suffix means that if a node's labels change after a Pod is already running on it, the Pod is not evicted. Kubernetes plans to add RequiredDuringExecution in the future, which would evict Pods when the node no longer matches. For now, use taints with NoExecute if you need to evict running Pods.