How Do You Tune Kubernetes Network Performance?
Kubernetes network performance tuning involves optimizing kube-proxy mode (IPVS over iptables at scale), tuning DNS (lowering ndots), configuring MTU correctly, using eBPF-based CNIs, and addressing conntrack table exhaustion.
Detailed Answer
Network performance in Kubernetes is affected by multiple layers: the CNI plugin, kube-proxy mode, DNS resolution, kernel parameters, and application-level configuration. Tuning each layer can dramatically improve throughput and latency.
1. kube-proxy Mode: iptables vs. IPVS
The default iptables mode processes rules linearly. With thousands of Services, every new connection walks through thousands of rules:
# iptables: O(n) — 10,000 Services = 10,000+ rules to walk
# IPVS: O(1) — hash table lookup regardless of Service count
Switch to IPVS for clusters with >1000 Services:
# kube-proxy ConfigMap
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
scheduler: "lc"
syncPeriod: "30s"
minSyncPeriod: "2s"
Or use eBPF-based kube-proxy replacement with Cilium, bypassing both iptables and IPVS:
# Cilium with kube-proxy replacement
helm install cilium cilium/cilium \
--set kubeProxyReplacement=true
2. DNS Performance
DNS is often the hidden bottleneck. The default ndots:5 setting causes excessive queries for external names.
Reduce ndots
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
dnsConfig:
options:
- name: ndots
value: "2"
containers:
- name: app
image: myapp:1.0
resources:
requests:
cpu: "100m"
memory: "128Mi"
Scale CoreDNS
# CoreDNS HPA for automatic scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: coredns
namespace: kube-system
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: coredns
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Use NodeLocal DNSCache
NodeLocal DNSCache runs a DNS cache on every node, reducing latency and CoreDNS load:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml
3. Conntrack Table Tuning
Every connection through a Service creates a conntrack entry. When the table is full, new connections are silently dropped.
# Check current conntrack usage
sysctl net.netfilter.nf_conntrack_count
sysctl net.netfilter.nf_conntrack_max
# Increase conntrack table size
sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.netfilter.nf_conntrack_buckets=262144
Symptoms of conntrack exhaustion:
- Intermittent connection failures
- DNS resolution timeouts (UDP conntrack)
nf_conntrack: table full, dropping packetin dmesg
4. MTU Configuration
Incorrect MTU causes packet fragmentation or drops, especially with overlay networks:
Host MTU: 1500
VXLAN overhead: 50 bytes
Pod MTU: 1450 (1500 - 50)
Host MTU: 9000 (jumbo frames)
VXLAN overhead: 50 bytes
Pod MTU: 8950
Configure MTU in your CNI:
# Calico MTU configuration
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
name: default
spec:
mtu: 1450
# Cilium MTU configuration
# helm install cilium --set mtu=1450
5. Kernel Parameter Tuning
Key sysctl parameters for high-throughput clusters:
# Increase socket buffer sizes
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216
# Increase connection backlog
sysctl -w net.core.somaxconn=32768
sysctl -w net.core.netdev_max_backlog=16384
# TCP tuning
sysctl -w net.ipv4.tcp_max_syn_backlog=8096
sysctl -w net.ipv4.tcp_slow_start_after_idle=0
sysctl -w net.ipv4.tcp_tw_reuse=1
# Increase local port range
sysctl -w net.ipv4.ip_local_port_range="1024 65535"
Apply via a DaemonSet for consistency:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: sysctl-tuner
namespace: kube-system
spec:
selector:
matchLabels:
app: sysctl-tuner
template:
metadata:
labels:
app: sysctl-tuner
spec:
hostPID: true
hostNetwork: true
initContainers:
- name: sysctl
image: busybox:1.36
command: ["sh", "-c"]
args:
- |
sysctl -w net.core.somaxconn=32768
sysctl -w net.netfilter.nf_conntrack_max=1048576
securityContext:
privileged: true
resources:
requests:
cpu: "10m"
memory: "16Mi"
containers:
- name: pause
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "5m"
memory: "8Mi"
6. CNI Performance Considerations
| CNI Mode | Throughput | Latency | Best For | |----------|-----------|---------|----------| | VXLAN overlay | Lower | Higher | Multi-subnet clusters | | Direct routing (BGP) | Higher | Lower | Same-subnet or BGP-capable networks | | eBPF | Highest | Lowest | Modern kernels (5.10+) | | Host networking | Native | Native | Latency-critical workloads (bypass CNI) |
For latency-critical workloads, consider hostNetwork: true to bypass the CNI entirely, at the cost of port conflicts and reduced isolation.
Monitoring Network Performance
# Check for dropped packets
kubectl exec <pod> -- netstat -s | grep -i drop
# Monitor conntrack
watch -n 1 'sysctl net.netfilter.nf_conntrack_count'
# Test latency between Pods
kubectl exec pod-a -- ping pod-b-ip
# Benchmark throughput
kubectl exec pod-a -- iperf3 -c pod-b-ip -t 30
Why Interviewers Ask This
Network performance problems in Kubernetes are subtle and often misdiagnosed. This question tests your ability to identify and resolve performance bottlenecks at the infrastructure level.
Common Follow-Up Questions
Key Takeaways
- Switch to IPVS mode when you exceed 1000 Services to avoid iptables performance degradation.
- Lower ndots from 5 to 2-3 to reduce DNS query amplification for external names.
- Size conntrack tables based on your connection volume to prevent dropped connections.