How Do You Tune Kubernetes Network Performance?

Q: How Do You Tune Kubernetes Network Performance?

Kubernetes network performance tuning involves optimizing kube-proxy mode (IPVS over iptables at scale), tuning DNS (lowering ndots), configuring MTU correctly, using eBPF-based CNIs, and addressing conntrack table exhaustion.

Detailed Answer

Network performance in Kubernetes is affected by multiple layers: the CNI plugin, kube-proxy mode, DNS resolution, kernel parameters, and application-level configuration. Tuning each layer can dramatically improve throughput and latency.

1. kube-proxy Mode: iptables vs. IPVS

The default iptables mode processes rules linearly. With thousands of Services, every new connection walks through thousands of rules:

# iptables: O(n) — 10,000 Services = 10,000+ rules to walk
# IPVS:     O(1) — hash table lookup regardless of Service count

Switch to IPVS for clusters with >1000 Services:

# kube-proxy ConfigMap
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
  scheduler: "lc"
  syncPeriod: "30s"
  minSyncPeriod: "2s"

Or use eBPF-based kube-proxy replacement with Cilium, bypassing both iptables and IPVS:

# Cilium with kube-proxy replacement
helm install cilium cilium/cilium \
  --set kubeProxyReplacement=true

2. DNS Performance

DNS is often the hidden bottleneck. The default ndots:5 setting causes excessive queries for external names.

Reduce ndots

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "2"
  containers:
    - name: app
      image: myapp:1.0
      resources:
        requests:
          cpu: "100m"
          memory: "128Mi"

Scale CoreDNS

# CoreDNS HPA for automatic scaling
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coredns
  namespace: kube-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coredns
  minReplicas: 3
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

Use NodeLocal DNSCache

NodeLocal DNSCache runs a DNS cache on every node, reducing latency and CoreDNS load:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/dns/nodelocaldns/nodelocaldns.yaml

3. Conntrack Table Tuning

Every connection through a Service creates a conntrack entry. When the table is full, new connections are silently dropped.

# Check current conntrack usage
sysctl net.netfilter.nf_conntrack_count
sysctl net.netfilter.nf_conntrack_max

# Increase conntrack table size
sysctl -w net.netfilter.nf_conntrack_max=1048576
sysctl -w net.netfilter.nf_conntrack_buckets=262144

Symptoms of conntrack exhaustion:

Intermittent connection failures
DNS resolution timeouts (UDP conntrack)
nf_conntrack: table full, dropping packet in dmesg

4. MTU Configuration

Incorrect MTU causes packet fragmentation or drops, especially with overlay networks:

Host MTU:     1500
VXLAN overhead: 50 bytes
Pod MTU:      1450 (1500 - 50)

Host MTU:     9000 (jumbo frames)
VXLAN overhead: 50 bytes
Pod MTU:      8950

Configure MTU in your CNI:

# Calico MTU configuration
apiVersion: projectcalico.org/v3
kind: FelixConfiguration
metadata:
  name: default
spec:
  mtu: 1450

# Cilium MTU configuration
# helm install cilium --set mtu=1450

5. Kernel Parameter Tuning

Key sysctl parameters for high-throughput clusters:

# Increase socket buffer sizes
sysctl -w net.core.rmem_max=16777216
sysctl -w net.core.wmem_max=16777216

# Increase connection backlog
sysctl -w net.core.somaxconn=32768
sysctl -w net.core.netdev_max_backlog=16384

# TCP tuning
sysctl -w net.ipv4.tcp_max_syn_backlog=8096
sysctl -w net.ipv4.tcp_slow_start_after_idle=0
sysctl -w net.ipv4.tcp_tw_reuse=1

# Increase local port range
sysctl -w net.ipv4.ip_local_port_range="1024 65535"

Apply via a DaemonSet for consistency:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: sysctl-tuner
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app: sysctl-tuner
  template:
    metadata:
      labels:
        app: sysctl-tuner
    spec:
      hostPID: true
      hostNetwork: true
      initContainers:
        - name: sysctl
          image: busybox:1.36
          command: ["sh", "-c"]
          args:
            - |
              sysctl -w net.core.somaxconn=32768
              sysctl -w net.netfilter.nf_conntrack_max=1048576
          securityContext:
            privileged: true
          resources:
            requests:
              cpu: "10m"
              memory: "16Mi"
      containers:
        - name: pause
          image: registry.k8s.io/pause:3.9
          resources:
            requests:
              cpu: "5m"
              memory: "8Mi"

6. CNI Performance Considerations

| CNI Mode | Throughput | Latency | Best For | |----------|-----------|---------|----------| | VXLAN overlay | Lower | Higher | Multi-subnet clusters | | Direct routing (BGP) | Higher | Lower | Same-subnet or BGP-capable networks | | eBPF | Highest | Lowest | Modern kernels (5.10+) | | Host networking | Native | Native | Latency-critical workloads (bypass CNI) |

For latency-critical workloads, consider hostNetwork: true to bypass the CNI entirely, at the cost of port conflicts and reduced isolation.

Monitoring Network Performance

# Check for dropped packets
kubectl exec <pod> -- netstat -s | grep -i drop

# Monitor conntrack
watch -n 1 'sysctl net.netfilter.nf_conntrack_count'

# Test latency between Pods
kubectl exec pod-a -- ping pod-b-ip

# Benchmark throughput
kubectl exec pod-a -- iperf3 -c pod-b-ip -t 30

Detailed Answer

1. kube-proxy Mode: iptables vs. IPVS

2. DNS Performance

Reduce ndots

Scale CoreDNS

Use NodeLocal DNSCache

3. Conntrack Table Tuning

4. MTU Configuration

5. Kernel Parameter Tuning

6. CNI Performance Considerations

Monitoring Network Performance

Why Interviewers Ask This

Common Follow-Up Questions

Key Takeaways

Related Questions

You Might Also Like