How Does kube-proxy Route Traffic to Services?
kube-proxy runs on every node and watches the API server for Service and EndpointSlice changes. It programs the node's networking stack (iptables, IPVS, or nftables) to intercept traffic destined for Service ClusterIPs and perform DNAT to forward it to a healthy backend Pod.
What Is kube-proxy?
kube-proxy is a network component that runs on every node in a Kubernetes cluster. Despite its name, in modern Kubernetes it does not actually proxy traffic. Instead, it watches the API server for Service and EndpointSlice resources and programs the node's kernel networking stack to intercept and redirect traffic destined for Service virtual IPs.
kube-proxy supports three modes: iptables, IPVS, and nftables.
The Role of kube-proxy
┌────────────────────────────────────────────────┐
│ API Server │
│ (Services, EndpointSlices) │
└──────────────┬─────────────────────────────────┘
│ watch events
▼
┌──────────────────────────────┐
│ kube-proxy (on each node) │
│ - Receives updates │
│ - Programs networking rules │
└──────────────┬───────────────┘
│
▼
┌──────────────────────────────┐
│ Kernel networking stack │
│ (iptables / IPVS / nftables)│
│ - Intercepts ClusterIP pkts │
│ - Performs DNAT to Pod IP │
└──────────────────────────────┘
When a Pod sends a packet to a Service's ClusterIP (e.g., 10.96.55.120:80), the kernel rules match the destination and rewrite it to a backend Pod's IP and port. kube-proxy itself is not in the data path.
iptables Mode (Default)
In iptables mode, kube-proxy creates a chain of iptables rules for each Service:
# View the rules kube-proxy creates (on a node)
iptables -t nat -L KUBE-SERVICES -n | head -20
For a Service with three endpoints, the rules look conceptually like this:
Chain KUBE-SVC-XXXXX (Service: my-app, ClusterIP: 10.96.55.120:80)
├── 33% probability -> DNAT to 10.244.1.5:8080 (Pod 1)
├── 50% probability -> DNAT to 10.244.2.8:8080 (Pod 2)
└── remainder -> DNAT to 10.244.3.12:8080 (Pod 3)
The probabilities are calculated so each Pod gets an equal share (1/3, 1/2 of the remaining, then the rest).
Pros and Cons of iptables Mode
| Pros | Cons |
|---|---|
| Default, widely tested | O(n) rule traversal per packet |
| No additional kernel modules needed | Rule update latency with many Services |
| Simple to debug with iptables -L | No advanced load-balancing algorithms |
| Reliable | Performance degrades past ~5,000 Services |
IPVS Mode
IPVS (IP Virtual Server) is a kernel-level Layer 4 load balancer. kube-proxy in IPVS mode creates IPVS virtual servers for each Service ClusterIP and registers Pod IPs as real servers:
# View IPVS rules (on a node)
ipvsadm -Ln
TCP 10.96.55.120:80 rr
-> 10.244.1.5:8080 Masq 1 0 0
-> 10.244.2.8:8080 Masq 1 0 0
-> 10.244.3.12:8080 Masq 1 0 0
Enabling IPVS Mode
# kube-proxy ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: kube-proxy
namespace: kube-system
data:
config.conf: |
mode: "ipvs"
ipvs:
scheduler: "rr" # round-robin (default)
# Other options: lc (least connections), dh (destination hashing),
# sh (source hashing), sed (shortest expected delay), nq (never queue)
After updating, restart kube-proxy:
kubectl rollout restart daemonset kube-proxy -n kube-system
Pros and Cons of IPVS Mode
| Pros | Cons |
|---|---|
| O(1) connection routing via hash tables | Requires IPVS kernel modules |
| Multiple scheduling algorithms | Slightly more complex debugging |
| Handles tens of thousands of Services | Less widely used than iptables |
| Better throughput and latency at scale | Requires ipvsadm tool for inspection |
nftables Mode
Starting in Kubernetes 1.29+, kube-proxy gained support for nftables mode as a modern replacement for iptables:
# kube-proxy ConfigMap
data:
config.conf: |
mode: "nftables"
nftables uses a more efficient rule evaluation engine than iptables and is the successor to iptables in the Linux kernel. It provides better performance than iptables mode while being conceptually similar.
How kube-proxy Handles Service Updates
- A new Pod matching a Service selector starts.
- The Endpoints controller creates or updates the EndpointSlice.
- kube-proxy on every node receives the watch event.
- kube-proxy updates the local iptables/IPVS/nftables rules.
- New connections are routed to the new endpoint.
Timeline:
t=0 Pod starts, passes readiness probe
t=0+ EndpointSlice updated with new Pod IP
t=0+ kube-proxy receives watch notification
t=0+ Rules updated on all nodes
t=0+ New connections can reach the Pod
The propagation delay is typically under a second but can vary in large clusters.
Debugging kube-proxy
Check kube-proxy Mode
# Check which mode kube-proxy is using
kubectl get configmap kube-proxy -n kube-system -o yaml | grep mode
# Or check logs
kubectl logs -n kube-system -l k8s-app=kube-proxy --tail=20 | grep "Using"
Verify Rules on a Node
# iptables mode: look for KUBE-SVC chains
iptables -t nat -L KUBE-SERVICES -n
# IPVS mode: list virtual servers
ipvsadm -Ln
# nftables mode: list rules
nft list table ip kube-proxy
Common Issues
| Symptom | Likely Cause | |---|---| | Service unreachable from Pods | kube-proxy not running or rules not synced | | Intermittent timeouts | One backend Pod is unhealthy; check readiness probes | | All traffic goes to one Pod | Session affinity is set, or only one endpoint exists | | Stale endpoints after scale-down | kube-proxy rule sync delay; check logs for errors |
kube-proxy Alternatives: eBPF
Modern CNI plugins like Cilium can replace kube-proxy entirely using eBPF programs attached directly to the Linux kernel's networking stack:
# Cilium Helm values to replace kube-proxy
kubeProxyReplacement: true
Benefits of eBPF-based routing:
- No iptables or IPVS rule management overhead
- Lower latency and higher throughput
- Better observability with built-in metrics
- Support for advanced features like socket-level load balancing
kube-proxy Configuration Reference
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs" # iptables | ipvs | nftables
clusterCIDR: "10.244.0.0/16"
ipvs:
scheduler: "rr"
syncPeriod: "30s"
minSyncPeriod: "2s"
iptables:
syncPeriod: "30s"
minSyncPeriod: "2s"
masqueradeAll: false
conntrack:
maxPerCore: 32768
min: 131072
The syncPeriod controls how often kube-proxy re-syncs all rules, while minSyncPeriod controls the minimum time between syncs triggered by watch events.
Summary
kube-proxy is the component responsible for translating Kubernetes Service definitions into actual networking rules on every node. It supports iptables, IPVS, and nftables modes, each with different performance characteristics. For large clusters, IPVS or nftables mode is recommended over iptables. For cutting-edge deployments, eBPF-based alternatives like Cilium can replace kube-proxy entirely, providing better performance and observability.
Why Interviewers Ask This
Understanding kube-proxy is critical for debugging networking issues, optimizing performance in large clusters, and making informed decisions about proxy modes. Interviewers use this question to gauge deep networking knowledge.
Common Follow-Up Questions
Key Takeaways
- kube-proxy is a node-level component that translates Service definitions into networking rules.
- Three modes exist: iptables (default), IPVS (for large clusters), and nftables (newer).
- kube-proxy does not proxy traffic itself in iptables/IPVS modes; it programs the kernel to do it.