How Does Node Affinity Work in Kubernetes?
Node affinity is an advanced scheduling mechanism that attracts Pods to nodes based on label expressions. It comes in two forms: requiredDuringSchedulingIgnoredDuringExecution (hard requirement) and preferredDuringSchedulingIgnoredDuringExecution (soft preference). It replaces nodeSelector with more expressive matching capabilities.
Detailed Answer
nodeSelector vs. Node Affinity
The simplest way to constrain Pods to specific nodes is nodeSelector:
spec:
nodeSelector:
disktype: ssd
region: us-east-1
This requires an exact label match and supports only equality. Node affinity extends this with richer operators, soft preferences, and weighted scoring.
Required Node Affinity (Hard Constraint)
requiredDuringSchedulingIgnoredDuringExecution is a hard requirement. The Pod will only be scheduled on nodes that match the expressions. If no nodes match, the Pod stays Pending.
apiVersion: v1
kind: Pod
metadata:
name: gpu-app
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: accelerator
operator: In
values:
- nvidia-tesla-v100
- nvidia-a100
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
containers:
- name: gpu-app
image: ml-model:latest
In this example:
- The expressions within the single term are ANDed: the node must have both the
acceleratorlabel (matching one of the listed GPUs) AND be in one of the listed zones.
Multiple Selector Terms (OR Logic)
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions: # Term 1
- key: accelerator
operator: In
values:
- nvidia-a100
- matchExpressions: # Term 2
- key: accelerator
operator: In
values:
- nvidia-h100
Multiple terms in nodeSelectorTerms are ORed: the node must match Term 1 OR Term 2.
Preferred Node Affinity (Soft Constraint)
preferredDuringSchedulingIgnoredDuringExecution tells the scheduler to prefer certain nodes but does not make it mandatory. Each preference has a weight (1-100) that influences scheduling decisions.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 6
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 80
preference:
matchExpressions:
- key: node-type
operator: In
values:
- compute-optimized
- weight: 20
preference:
matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
containers:
- name: web
image: nginx:1.25
The scheduler adds scores based on weights. In this example, compute-optimized nodes get 80 points and nodes in us-east-1a get 20 points. A node matching both gets 100 points. The scheduler picks the highest-scoring node.
Operators
| Operator | Meaning | |---|---| | In | Label value is in the list | | NotIn | Label value is not in the list | | Exists | Label key exists (value ignored) | | DoesNotExist | Label key does not exist | | Gt | Label value is greater than (numeric comparison) | | Lt | Label value is less than (numeric comparison) |
# Example: Schedule on nodes with at least 8 GPUs
matchExpressions:
- key: gpu-count
operator: Gt
values:
- "7" # Must be a string, compared as integer
Combining Required and Preferred
You can use both simultaneously. The required affinity filters eligible nodes, and the preferred affinity ranks them:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- amd64
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- high-memory
This means: the Pod MUST run on amd64 architecture, and PREFERS high-memory nodes.
Practical Use Cases
Zone-Aware Scheduling
# Require Pods to run in specific availability zones
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- us-east-1a
- us-east-1b
Architecture-Specific Workloads
# Run on ARM64 nodes
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/arch
operator: In
values:
- arm64
Cost Optimization with Spot Instances
# Prefer spot instances but fall back to on-demand
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 90
preference:
matchExpressions:
- key: node-lifecycle
operator: In
values:
- spot
- weight: 10
preference:
matchExpressions:
- key: node-lifecycle
operator: In
values:
- on-demand
Verifying Node Labels
# List all node labels
kubectl get nodes --show-labels
# Check labels on a specific node
kubectl describe node worker-1 | grep -A 20 Labels
# Add a label to a node
kubectl label nodes worker-1 node-type=compute-optimized
# Remove a label
kubectl label nodes worker-1 node-type-
# Check why a Pod is Pending (scheduling failure)
kubectl describe pod gpu-app | grep -A 10 Events
IgnoredDuringExecution
The "IgnoredDuringExecution" suffix means that if a node's labels change after a Pod is already running on it, the Pod is not evicted. Kubernetes plans to add RequiredDuringExecution in the future, which would evict Pods when the node no longer matches. For now, use taints with NoExecute if you need to evict running Pods.
Why Interviewers Ask This
Interviewers ask this to test whether you can control Pod placement for performance, compliance, or hardware requirements using the Kubernetes scheduler.
Common Follow-Up Questions
Key Takeaways
- Node affinity attracts Pods to nodes; it is the opposite of taints
- required is a hard constraint; preferred is a soft constraint with weights
- Multiple selector terms are ORed; expressions within a term are ANDed