What Is a Service Mesh and How Does It Relate to Kubernetes Services?

intermediate|servicesdevopssreCKACKAD

TL;DR

A service mesh is an infrastructure layer that handles service-to-service communication by deploying sidecar proxies alongside each Pod. It provides features like mutual TLS, advanced load balancing, traffic splitting, and observability that go beyond what native Kubernetes Services offer.

What Is a Service Mesh?

A service mesh is a dedicated infrastructure layer for managing service-to-service communication within a Kubernetes cluster. It works by deploying a proxy (usually as a sidecar container) alongside every application Pod. These proxies intercept all network traffic and apply policies for security, routing, reliability, and observability without requiring changes to application code.

A service mesh sits on top of native Kubernetes Services, using them for service discovery while adding capabilities that Services alone cannot provide.

What Native Services Lack

Kubernetes Services provide basic load balancing and service discovery, but they have limitations in production environments:

| Capability | Native K8s Services | Service Mesh | |---|---|---| | Service discovery | Yes (DNS) | Yes (DNS + proxy registry) | | Basic load balancing | Yes (random/round-robin) | Yes (+ least connections, weighted, etc.) | | Mutual TLS (mTLS) | No | Yes | | Canary / traffic splitting | No | Yes | | Retries and timeouts | No | Yes | | Circuit breaking | No | Yes | | Distributed tracing | No | Yes | | Traffic mirroring | No | Yes | | Rate limiting | No | Yes |

Architecture: Data Plane and Control Plane

A service mesh has two components:

┌──────────────────────────────────────────────────────────┐
│                    Control Plane                          │
│  (Istiod / Linkerd control plane)                        │
│  - Distributes configuration to proxies                  │
│  - Manages certificates for mTLS                         │
│  - Collects telemetry data                               │
└──────────────┬───────────────────────────────────────────┘
               │ configuration + certs
               ▼
┌──────────────────────────────────────────────────────────┐
│                    Data Plane                             │
│                                                          │
│  ┌─────────────────┐        ┌─────────────────┐         │
│  │ Pod A           │        │ Pod B           │         │
│  │ ┌─────────────┐ │        │ ┌─────────────┐ │         │
│  │ │ App         │ │        │ │ App         │ │         │
│  │ │ Container   │ │        │ │ Container   │ │         │
│  │ └──────┬──────┘ │        │ └──────┬──────┘ │         │
│  │        │        │        │        │        │         │
│  │ ┌──────▼──────┐ │  mTLS  │ ┌──────▼──────┐ │         │
│  │ │ Sidecar     │ │◄──────►│ │ Sidecar     │ │         │
│  │ │ Proxy       │ │        │ │ Proxy       │ │         │
│  │ └─────────────┘ │        │ └─────────────┘ │         │
│  └─────────────────┘        └─────────────────┘         │
└──────────────────────────────────────────────────────────┘

The control plane configures all proxies and manages certificates. The data plane consists of the sidecar proxies that handle actual traffic.

Key Service Mesh Features

Mutual TLS (mTLS)

The mesh automatically encrypts all service-to-service traffic and verifies both client and server identities:

# Istio PeerAuthentication -- enforce mTLS for all services
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: production
spec:
  mtls:
    mode: STRICT

With this policy, all traffic between Pods in the production namespace must be encrypted with mTLS. No application code changes are needed because the sidecar proxies handle certificate exchange.

Traffic Splitting (Canary Deployments)

Route a percentage of traffic to a new version while keeping most traffic on the stable version:

# Istio VirtualService -- 90/10 canary split
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: reviews
spec:
  hosts:
    - reviews
  http:
    - route:
        - destination:
            host: reviews
            subset: v1
          weight: 90
        - destination:
            host: reviews
            subset: v2
          weight: 10
---
apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  subsets:
    - name: v1
      labels:
        version: v1
    - name: v2
      labels:
        version: v2

This sends 90% of traffic to v1 and 10% to v2. Kubernetes native Services have no equivalent.

Retries and Timeouts

apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: ratings
spec:
  hosts:
    - ratings
  http:
    - route:
        - destination:
            host: ratings
      retries:
        attempts: 3
        perTryTimeout: 2s
        retryOn: 5xx,connect-failure
      timeout: 10s

The proxy automatically retries failed requests up to 3 times with a 2-second timeout per attempt.

Circuit Breaking

apiVersion: networking.istio.io/v1beta1
kind: DestinationRule
metadata:
  name: reviews
spec:
  host: reviews
  trafficPolicy:
    connectionPool:
      tcp:
        maxConnections: 100
      http:
        h2UpgradePolicy: DEFAULT
        http1MaxPendingRequests: 50
        http2MaxRequests: 100
    outlierDetection:
      consecutive5xxErrors: 5
      interval: 30s
      baseEjectionTime: 60s
      maxEjectionPercent: 50

If a Pod returns 5 consecutive 5xx errors, it is ejected from the load balancing pool for 60 seconds.

Observability

Service meshes automatically collect metrics, traces, and logs for all traffic:

# Istio automatically exposes Prometheus metrics
# Request count by source, destination, and response code
istio_requests_total{source_workload="frontend", destination_workload="backend", response_code="200"}

# Request duration histogram
istio_request_duration_milliseconds_bucket{...}

These metrics are generated by the sidecar proxies without instrumenting application code.

Popular Service Mesh Implementations

| Mesh | Proxy | Key Differentiator | |---|---|---| | Istio | Envoy | Feature-rich, large community, complex | | Linkerd | linkerd2-proxy (Rust) | Lightweight, simpler, lower resource usage | | Cilium | eBPF (no sidecar) | Sidecar-less, kernel-level routing |

Linkerd Example

# Install Linkerd
linkerd install | kubectl apply -f -

# Inject sidecar into a namespace
kubectl annotate namespace production linkerd.io/inject=enabled

# Restart Pods to pick up the sidecar
kubectl rollout restart deployment -n production

Linkerd is known for being simpler to operate than Istio while covering most use cases.

When to Use a Service Mesh

Use a mesh when you need:

Encryption between all services (zero-trust networking)
Canary deployments and traffic splitting
Detailed request-level metrics and distributed tracing
Cross-team policy enforcement (rate limits, retries, circuit breaking)

Skip a mesh when:

Your cluster has fewer than 10 services
You do not need mTLS (cluster network is trusted)
You do not need advanced traffic management
Resource overhead matters (each sidecar consumes CPU and memory)

Resource Overhead

Each sidecar proxy adds resource consumption:

| Proxy | Typical Memory | Typical CPU | |---|---|---| | Envoy (Istio) | 50-100 MB per Pod | 10-50m per Pod | | linkerd2-proxy | 10-20 MB per Pod | 1-10m per Pod | | Cilium eBPF | No per-Pod overhead | Minimal kernel overhead |

In a cluster with 500 Pods, Istio sidecars add roughly 25-50 GB of memory overhead across the cluster.

Summary

A service mesh extends Kubernetes Services with advanced networking capabilities including mTLS, traffic splitting, retries, circuit breaking, and observability. It works by deploying sidecar proxies that intercept all traffic. While powerful, it adds complexity and resource overhead. The decision to adopt a service mesh should be driven by concrete requirements around security, traffic management, or observability that native Kubernetes Services cannot fulfill.

Why Interviewers Ask This

Service meshes are increasingly common in production Kubernetes environments. Interviewers ask this to assess whether candidates understand the limitations of native Services and when a service mesh adds value versus unnecessary complexity.

Common Follow-Up Questions

What is a sidecar proxy and how does it intercept traffic?

A sidecar proxy (like Envoy) is a container injected into every Pod. It intercepts all inbound and outbound traffic using iptables rules that redirect network traffic through the proxy before it reaches or leaves the application container.

Can you name some popular service mesh implementations?

Istio, Linkerd, and Cilium Service Mesh are the most common. Istio uses Envoy proxies, Linkerd uses its own lightweight proxy (linkerd2-proxy), and Cilium uses eBPF instead of sidecar proxies.

When should you NOT use a service mesh?

When your cluster is small, your services are few, and you do not need mTLS, canary deployments, or advanced observability. A service mesh adds resource overhead, operational complexity, and latency.

Key Takeaways

A service mesh adds a proxy layer on top of Kubernetes Services for advanced networking features.
Key capabilities include mTLS, traffic splitting, retries, circuit breaking, and distributed tracing.
Service meshes add complexity and resource overhead; they are not needed for every cluster.

Back to Services Interview Questions