What Are Resource Requests and Limits in Kubernetes?

Q: What Are Resource Requests and Limits in Kubernetes?

Resource requests define the minimum CPU and memory a container needs and are used by the scheduler to place Pods. Limits define the maximum resources a container can use. Exceeding memory limits causes an OOM kill; exceeding CPU limits causes throttling.

Detailed Answer

Every container in a Kubernetes Pod can specify resource requests and resource limits for CPU and memory. These two settings control how Kubernetes schedules and constrains workloads.

Requests vs. Limits

| Concept | Purpose | Enforcement | |---------|---------|-------------| | Request | Minimum resources guaranteed to the container | Used by the scheduler to find a node with enough capacity | | Limit | Maximum resources the container can use | Enforced at runtime by the kernel (cgroups) |

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
    - name: app
      image: myapp/server:2.1
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "1"
          memory: "512Mi"

CPU Resources

CPU is measured in millicores (m). One CPU core equals 1000m.

cpu: "250m" -- one quarter of a CPU core
cpu: "1" -- one full CPU core
cpu: "1.5" -- one and a half CPU cores

Requests: The scheduler ensures the node has at least this much CPU allocatable. The container is guaranteed this CPU capacity.

Limits: If the container tries to use more CPU than its limit, it is throttled by the Completely Fair Scheduler (CFS). The container is not killed, but it becomes slower.

# Check if a container is being CPU-throttled
kubectl exec resource-demo -- cat /sys/fs/cgroup/cpu.stat
# Look for nr_throttled and throttled_time values

Memory Resources

Memory is measured in bytes. Common suffixes:

Ki = kibibytes (1024 bytes)
Mi = mebibytes (1024 Ki)
Gi = gibibytes (1024 Mi)

Requests: The scheduler ensures the node has at least this much memory allocatable.

Limits: If the container tries to use more memory than its limit, it is OOM-killed (Out of Memory). Unlike CPU throttling, this is a hard kill.

# Check if a container was OOM-killed
kubectl describe pod resource-demo
# Look for: Last State: Terminated, Reason: OOMKilled

How Scheduling Works with Requests

The scheduler uses requests (not limits) when deciding where to place a Pod. It sums the requests of all Pods on a node and ensures the total does not exceed the node's allocatable resources.

Node allocatable CPU: 4000m
Existing Pod requests: 3000m
New Pod request: 1500m

Result: Pod cannot be scheduled on this node (3000 + 1500 > 4000)

This means:

Setting requests too high wastes cluster resources (Pods cannot be packed efficiently).
Setting requests too low leads to overcommitment and potential eviction under pressure.

What Happens Without Requests or Limits

| Configuration | Behavior | |--------------|----------| | No requests, no limits | BestEffort QoS -- evicted first, no guarantees | | Requests only | Burstable QoS -- can use unlimited CPU/memory above request | | Limits only | Requests are automatically set equal to limits (Guaranteed QoS) | | Both requests and limits | Burstable or Guaranteed depending on whether they match |

Note that starting with Kubernetes 1.32, if you set limits but not requests, the requests default to the limit values. This is important because it can inadvertently give you a Guaranteed QoS class.

LimitRange: Namespace Defaults

A LimitRange sets default resource values and constraints for a namespace:

apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "256Mi"
      defaultRequest:
        cpu: "250m"
        memory: "128Mi"
      max:
        cpu: "2"
        memory: "2Gi"
      min:
        cpu: "50m"
        memory: "32Mi"

This ensures that every container in the production namespace gets at least the default requests and cannot exceed the max values.

ResourceQuota: Namespace Totals

While LimitRange constrains individual containers, ResourceQuota constrains the aggregate resources for an entire namespace:

apiVersion: v1
kind: ResourceQuota
metadata:
  name: namespace-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: "20Gi"
    limits.cpu: "20"
    limits.memory: "40Gi"
    pods: "50"

Right-Sizing Resources

Setting resources correctly is one of the hardest operational tasks. These tools and approaches help:

# Check actual resource usage
kubectl top pod resource-demo

# Use Vertical Pod Autoscaler (VPA) recommendations
kubectl get vpa my-app-vpa -o jsonpath='{.status.recommendation}'

A practical approach:

Start with generous limits and conservative requests.
Observe actual usage with kubectl top or Prometheus metrics (container_cpu_usage_seconds_total, container_memory_working_set_bytes).
Adjust requests to match the p95 actual usage.
Set limits at 2-3x the request as a safety margin.
Use VPA in recommendation mode to automate this analysis.

Best Practices

Always set both requests and limits in production to ensure predictable scheduling and prevent runaway containers.
Set requests based on actual observed usage -- not guesses.
Be cautious with CPU limits -- some teams omit CPU limits intentionally to avoid throttling, relying on requests for scheduling and leaving CPU unbound.
Never set memory limits lower than requests -- this creates an impossible constraint.
Use LimitRange to enforce resource hygiene across teams sharing a cluster.
Monitor for OOM kills and CPU throttling as signals that your resource settings need adjustment.