What is the difference between Requests and Limits in Kubernetes?

Requests vs Limits

Key Differences in Kubernetes

Requests define the minimum resources a container is guaranteed — the scheduler uses them to find a node with enough capacity. Limits define the maximum resources a container can use — exceeding CPU limits causes throttling, and exceeding memory limits causes an OOM kill. Set requests for scheduling and guaranteed baseline; set limits to prevent runaway resource consumption.

Side-by-Side Comparison

Dimension	Requests	Limits
Purpose	Guarantees a minimum amount of resources for the container	Caps the maximum amount of resources the container can use
Scheduling	Used by the scheduler to find a node with enough capacity	Not used for scheduling decisions
CPU Enforcement	Guaranteed CPU time via CFS shares	Enforced via CFS quota — container is throttled when exceeded
Memory Enforcement	Guaranteed memory allocation	Hard cap — container is OOM killed if it exceeds the limit
QoS Class	Determines QoS class together with limits	Determines QoS class together with requests
Default Value	0 if not specified (unless LimitRange sets defaults)	Unlimited if not specified (unless LimitRange sets defaults)
Node Capacity	Sum of all requests cannot exceed node allocatable resources	Sum of all limits can exceed node capacity (overcommit)

Detailed Breakdown

Setting Requests and Limits

apiVersion: v1
kind: Pod
metadata:
  name: web-app
spec:
  containers:
    - name: app
      image: my-app:1.0.0
      resources:
        requests:
          cpu: 100m       # 100 millicores (0.1 CPU)
          memory: 256Mi   # 256 mebibytes
        limits:
          cpu: 500m       # 500 millicores (0.5 CPU)
          memory: 512Mi   # 512 mebibytes

This container:

Is guaranteed 100m CPU and 256Mi memory (requests)
Can burst up to 500m CPU and 512Mi memory (limits)
Will be throttled if it tries to use more than 500m CPU
Will be OOM killed if it tries to use more than 512Mi memory

How CPU Requests and Limits Work

CPU is a compressible resource — Kubernetes can throttle it without killing the container.

Requests translate to CFS (Completely Fair Scheduler) shares. A container with 100m CPU gets proportional CPU time when the node is contended. On an idle node, it can use more than its request.

Limits translate to CFS quota. The container gets a maximum amount of CPU time per 100ms period. If the container tries to use 500m but its limit is 200m, the kernel pauses it for a portion of each period.

# View the CFS settings for a container
cat /sys/fs/cgroup/cpu/cpu.cfs_period_us   # 100000 (100ms)
cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us    # 50000 (500m = 50ms per 100ms period)

How Memory Requests and Limits Work

Memory is an incompressible resource — Kubernetes cannot reclaim memory without killing the process.

Requests reserve memory on the node. The scheduler ensures the sum of all memory requests does not exceed the node's allocatable memory.

Limits set a hard cap via cgroups. If the container's memory usage exceeds the limit, the kernel OOM killer terminates it. The kubelet then restarts the container based on the Pod's restart policy.

# This container will be OOM killed if it uses more than 512Mi
resources:
  limits:
    memory: 512Mi

QoS Classes

Kubernetes assigns one of three Quality of Service classes based on requests and limits:

Guaranteed — requests equal limits for all containers:

resources:
  requests:
    cpu: 500m
    memory: 256Mi
  limits:
    cpu: 500m
    memory: 256Mi

Guaranteed Pods are the last to be evicted under node pressure. Use for critical workloads like databases.

Burstable — at least one container has requests set, but they differ from limits:

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

Burstable Pods are evicted after BestEffort Pods when the node is under pressure.

BestEffort — no requests or limits set:

resources: {}

BestEffort Pods are the first to be evicted under memory pressure. Avoid this in production.

# Check a Pod's QoS class
kubectl get pod web-app -o jsonpath='{.status.qosClass}'

The CPU Limits Debate

There is an active debate about whether to set CPU limits at all:

Arguments against CPU limits:

CPU throttling can cause latency spikes even when the node has idle CPU capacity
Applications performing GC or handling traffic bursts need temporary CPU spikes
Requests alone guarantee fair sharing — if the node is idle, why not let the container use it?

Arguments for CPU limits:

Prevents a single container from monopolizing CPU
Makes resource consumption predictable and easier to capacity plan
Required for Guaranteed QoS class

A common production pattern is to set CPU requests but not CPU limits, while always setting memory limits:

resources:
  requests:
    cpu: 100m
    memory: 256Mi
  limits:
    # No CPU limit — allows bursting
    memory: 512Mi  # Memory limit always set to prevent OOM

LimitRange — Namespace Defaults

Administrators can set default requests and limits for a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        cpu: 500m
        memory: 256Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      max:
        cpu: "2"
        memory: 2Gi
      min:
        cpu: 50m
        memory: 64Mi

If a Pod in the production namespace does not specify resources, it gets the defaults. If it specifies values outside the min/max range, the API server rejects it.

ResourceQuota — Namespace Totals

While LimitRange controls individual containers, ResourceQuota controls the total resources consumed by a namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"

This limits the team-a namespace to a total of 10 CPU cores requested, 20Gi memory requested, and 50 Pods. When a ResourceQuota is active, all Pods must specify requests and limits.

Right-Sizing Resources

Setting requests and limits correctly requires observation:

# Check actual resource usage
kubectl top pods
kubectl top nodes

# View resource requests and limits for all Pods
kubectl describe node node-1 | grep -A 5 "Allocated resources"

Tools like the Vertical Pod Autoscaler (VPA) analyze actual usage and recommend resource values:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"  # Recommendation only, no automatic updates

Overcommit and Node Pressure

Requests cannot be overcommitted — the scheduler will not place Pods whose requests exceed available capacity. Limits can be overcommitted — the sum of all limits can exceed node capacity.

This means the node relies on not all containers hitting their limits simultaneously. If they do, the kubelet evicts Pods based on QoS class and actual usage:

BestEffort Pods evicted first
Burstable Pods evicted next (those using most above their requests)
Guaranteed Pods evicted last

Best Practices

Always set memory limits — memory leaks without limits will take down the node
Always set CPU and memory requests — without them, the scheduler cannot make informed decisions
Start with requests = limits (Guaranteed QoS) and relax if needed
Use VPA recommendations to right-size based on actual usage data
Set namespace ResourceQuotas to prevent teams from over-consuming cluster resources

Use Requests when...

•You want to guarantee your application has enough resources to run
•You need predictable scheduling — the scheduler reserves this capacity
•You want to establish a resource baseline for capacity planning
•You're setting up a Guaranteed QoS class (requests = limits)
•You need to prevent your Pod from being evicted under memory pressure

Use Limits when...

•You want to prevent a container from consuming too much CPU or memory
•You need to protect the node from runaway processes
•You want to set memory limits to trigger OOM kills for leaking applications
•You're enforcing resource quotas at the namespace level
•You need to cap burst resource usage

Model Interview Answer

“Requests and limits control resource allocation differently. Requests are what the container is guaranteed — the scheduler finds a node that can satisfy the request, and the kubelet reserves that capacity. If you request 256Mi of memory, the node guarantees it. Limits are the maximum — the container can burst up to the limit but no further. For CPU, exceeding the limit causes throttling. For memory, exceeding the limit triggers an OOM kill. They also determine the QoS class: if requests equal limits, the Pod gets Guaranteed QoS (last to be evicted). If only limits are set, or requests are lower than limits, it gets Burstable. If neither is set, it gets BestEffort (first to be evicted).”

Related Comparisons

liveness probe vs. readiness probe taints vs. node affinity