What Is a Service Mesh and How Does It Relate to Kubernetes Services?
A service mesh is an infrastructure layer that handles service-to-service communication by deploying sidecar proxies alongside each Pod. It provides features like mutual TLS, advanced load balancing, traffic splitting, and observability that go beyond what native Kubernetes Services offer.
What Is a Service Mesh?
A service mesh is a dedicated infrastructure layer for managing service-to-service communication within a Kubernetes cluster. It works by deploying a proxy (usually as a sidecar container) alongside every application Pod. These proxies intercept all network traffic and apply policies for security, routing, reliability, and observability without requiring changes to application code.
A service mesh sits on top of native Kubernetes Services, using them for service discovery while adding capabilities that Services alone cannot provide.
What Native Services Lack
Kubernetes Services provide basic load balancing and service discovery, but they have limitations in production environments:
| Capability | Native K8s Services | Service Mesh | |---|---|---| | Service discovery | Yes (DNS) | Yes (DNS + proxy registry) | | Basic load balancing | Yes (random/round-robin) | Yes (+ least connections, weighted, etc.) | | Mutual TLS (mTLS) | No | Yes | | Canary / traffic splitting | No | Yes | | Retries and timeouts | No | Yes | | Circuit breaking | No | Yes | | Distributed tracing | No | Yes | | Traffic mirroring | No | Yes | | Rate limiting | No | Yes |
Architecture: Data Plane and Control Plane
A service mesh has two components:
┌──────────────────────────────────────────────────────────┐
│ Control Plane │
│ (Istiod / Linkerd control plane) │
│ - Distributes configuration to proxies │
│ - Manages certificates for mTLS │
│ - Collects telemetry data │
└──────────────┬───────────────────────────────────────────┘
│ configuration + certs
▼
┌──────────────────────────────────────────────────────────┐
│ Data Plane │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Pod A │ │ Pod B │ │
│ │ ┌─────────────┐ │ │ ┌─────────────┐ │ │
│ │ │ App │ │ │ │ App │ │ │
│ │ │ Container │ │ │ │ Container │ │ │
│ │ └──────┬──────┘ │ │ └──────┬──────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌──────▼──────┐ │ mTLS │ ┌──────▼──────┐ │ │
│ │ │ Sidecar │ │◄──────►│ │ Sidecar │ │ │
│ │ │ Proxy │ │ │ │ Proxy │ │ │
│ │ └─────────────┘ │ │ └─────────────┘ │ │
│ └─────────────────┘ └─────────────────┘ │
└──────────────────────────────────────────────────────────┘
The control plane configures all proxies and manages certificates. The data plane consists of the sidecar proxies that handle actual traffic.
Key Service Mesh Features
Mutual TLS (mTLS)
The mesh automatically encrypts all service-to-service traffic and verifies both client and server identities:
# Istio PeerAuthentication -- enforce mTLS for all services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
name: default
namespace: production
spec:
mtls:
mode: STRICT
With this policy, all traffic between Pods in the production namespace must be encrypted with mTLS. No application code changes are needed because the sidecar proxies handle certificate exchange.
Traffic Splitting (Canary Deployments)
Route a percentage of traffic to a new version while keeping most traffic on the stable version:
# Istio VirtualService -- 90/10 canary split
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
weight: 90
- destination:
host: reviews
subset: v2
weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
This sends 90% of traffic to v1 and 10% to v2. Kubernetes native Services have no equivalent.
Retries and Timeouts
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
name: ratings
spec:
hosts:
- ratings
http:
- route:
- destination:
host: ratings
retries:
attempts: 3
perTryTimeout: 2s
retryOn: 5xx,connect-failure
timeout: 10s
The proxy automatically retries failed requests up to 3 times with a 2-second timeout per attempt.
Circuit Breaking
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 100
http:
h2UpgradePolicy: DEFAULT
http1MaxPendingRequests: 50
http2MaxRequests: 100
outlierDetection:
consecutive5xxErrors: 5
interval: 30s
baseEjectionTime: 60s
maxEjectionPercent: 50
If a Pod returns 5 consecutive 5xx errors, it is ejected from the load balancing pool for 60 seconds.
Observability
Service meshes automatically collect metrics, traces, and logs for all traffic:
# Istio automatically exposes Prometheus metrics
# Request count by source, destination, and response code
istio_requests_total{source_workload="frontend", destination_workload="backend", response_code="200"}
# Request duration histogram
istio_request_duration_milliseconds_bucket{...}
These metrics are generated by the sidecar proxies without instrumenting application code.
Popular Service Mesh Implementations
| Mesh | Proxy | Key Differentiator | |---|---|---| | Istio | Envoy | Feature-rich, large community, complex | | Linkerd | linkerd2-proxy (Rust) | Lightweight, simpler, lower resource usage | | Cilium | eBPF (no sidecar) | Sidecar-less, kernel-level routing |
Linkerd Example
# Install Linkerd
linkerd install | kubectl apply -f -
# Inject sidecar into a namespace
kubectl annotate namespace production linkerd.io/inject=enabled
# Restart Pods to pick up the sidecar
kubectl rollout restart deployment -n production
Linkerd is known for being simpler to operate than Istio while covering most use cases.
When to Use a Service Mesh
Use a mesh when you need:
- Encryption between all services (zero-trust networking)
- Canary deployments and traffic splitting
- Detailed request-level metrics and distributed tracing
- Cross-team policy enforcement (rate limits, retries, circuit breaking)
Skip a mesh when:
- Your cluster has fewer than 10 services
- You do not need mTLS (cluster network is trusted)
- You do not need advanced traffic management
- Resource overhead matters (each sidecar consumes CPU and memory)
Resource Overhead
Each sidecar proxy adds resource consumption:
| Proxy | Typical Memory | Typical CPU | |---|---|---| | Envoy (Istio) | 50-100 MB per Pod | 10-50m per Pod | | linkerd2-proxy | 10-20 MB per Pod | 1-10m per Pod | | Cilium eBPF | No per-Pod overhead | Minimal kernel overhead |
In a cluster with 500 Pods, Istio sidecars add roughly 25-50 GB of memory overhead across the cluster.
Summary
A service mesh extends Kubernetes Services with advanced networking capabilities including mTLS, traffic splitting, retries, circuit breaking, and observability. It works by deploying sidecar proxies that intercept all traffic. While powerful, it adds complexity and resource overhead. The decision to adopt a service mesh should be driven by concrete requirements around security, traffic management, or observability that native Kubernetes Services cannot fulfill.
Why Interviewers Ask This
Service meshes are increasingly common in production Kubernetes environments. Interviewers ask this to assess whether candidates understand the limitations of native Services and when a service mesh adds value versus unnecessary complexity.
Common Follow-Up Questions
Key Takeaways
- A service mesh adds a proxy layer on top of Kubernetes Services for advanced networking features.
- Key capabilities include mTLS, traffic splitting, retries, circuit breaking, and distributed tracing.
- Service meshes add complexity and resource overhead; they are not needed for every cluster.