What Is a Pod Security Context?

advanced|podsdevopssreCKACKAD
TL;DR

A security context defines privilege and access control settings for Pods and their containers. It controls the user/group IDs, Linux capabilities, filesystem permissions, SELinux labels, seccomp profiles, and whether containers run as root or with a read-only filesystem.

Detailed Answer

A security context in Kubernetes defines privilege and access control settings at the Pod level and/or the container level. It is the primary mechanism for hardening containers against security threats.

Pod-Level Security Context

Pod-level settings apply to all containers in the Pod, including init and ephemeral containers:

apiVersion: v1
kind: Pod
metadata:
  name: secure-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: myapp/server:2.1
      ports:
        - containerPort: 8080
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"

Key Pod-level fields:

| Field | Purpose | |-------|---------| | runAsUser | UID that all containers run as | | runAsGroup | Primary GID for all containers | | fsGroup | GID applied to all mounted volumes; files created get this group | | runAsNonRoot | If true, Kubernetes validates that the container does not run as root | | seccompProfile | Seccomp filtering profile for syscalls | | supplementalGroups | Additional GIDs applied to the first process in each container |

Container-Level Security Context

Container-level settings override Pod-level settings and include additional container-specific options:

apiVersion: v1
kind: Pod
metadata:
  name: hardened-pod
spec:
  securityContext:
    runAsNonRoot: true
    runAsUser: 1000
    fsGroup: 2000
    seccompProfile:
      type: RuntimeDefault
  containers:
    - name: app
      image: myapp/server:2.1
      securityContext:
        allowPrivilegeEscalation: false
        readOnlyRootFilesystem: true
        capabilities:
          drop:
            - ALL
          add:
            - NET_BIND_SERVICE
      ports:
        - containerPort: 8080
      volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /var/cache
      resources:
        requests:
          cpu: "250m"
          memory: "256Mi"
        limits:
          cpu: "500m"
          memory: "512Mi"
  volumes:
    - name: tmp
      emptyDir: {}
    - name: cache
      emptyDir: {}

Key container-level fields:

| Field | Purpose | |-------|---------| | allowPrivilegeEscalation | If false, a process cannot gain more privileges than its parent (blocks setuid, no_new_privs) | | readOnlyRootFilesystem | Makes the container's root filesystem read-only | | capabilities.drop | Linux capabilities to remove from the container | | capabilities.add | Linux capabilities to add to the container | | privileged | If true, the container runs with full host privileges (avoid in production) |

Linux Capabilities Explained

Linux capabilities break root privileges into individual units. By default, containers get a limited set of capabilities. The security best practice is to drop all and add back only what is needed:

securityContext:
  capabilities:
    drop:
      - ALL
    add:
      - NET_BIND_SERVICE   # Bind to ports below 1024

Common capabilities:

| Capability | Use Case | |-----------|----------| | NET_BIND_SERVICE | Bind to privileged ports (< 1024) | | SYS_PTRACE | Debug processes (needed for ephemeral containers) | | NET_RAW | Use raw sockets (needed by ping, some network tools) | | CHOWN | Change file ownership |

The fsGroup Field

The fsGroup setting is particularly important for volume permissions:

spec:
  securityContext:
    fsGroup: 2000
  containers:
    - name: app
      image: myapp/server:2.1
      volumeMounts:
        - name: data
          mountPath: /data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: my-pvc

When fsGroup is set:

  1. All files in mounted volumes are owned by the fsGroup GID.
  2. New files created in the volume get the fsGroup GID.
  3. The container's process supplementary groups include the fsGroup.

This is critical for applications running as non-root that need write access to persistent volumes.

Seccomp Profiles

Seccomp (Secure Computing Mode) restricts which system calls a container can make:

securityContext:
  seccompProfile:
    type: RuntimeDefault  # Use the container runtime's default profile

| Profile Type | Behavior | |-------------|----------| | RuntimeDefault | Uses the container runtime's default seccomp profile (blocks ~50 dangerous syscalls) | | Localhost | Uses a custom profile from the node's filesystem | | Unconfined | No seccomp filtering (not recommended for production) |

Pod Security Standards Compliance

Kubernetes defines three Pod Security Standards that map to security context configurations:

Restricted (most secure -- recommended for all production workloads):

spec:
  securityContext:
    runAsNonRoot: true
    seccompProfile:
      type: RuntimeDefault
  containers:
    - securityContext:
        allowPrivilegeEscalation: false
        capabilities:
          drop: ["ALL"]
        readOnlyRootFilesystem: true

Baseline (minimally restrictive):

  • No privileged containers
  • No hostNetwork, hostPID, hostIPC
  • Limited volume types

Privileged (unrestricted -- for infrastructure-level Pods only):

  • No restrictions applied

Verifying Security Context

# Check the security context of a running Pod
kubectl get pod secure-pod -o jsonpath='{.spec.securityContext}'

# Verify the container is running as expected user
kubectl exec secure-pod -- id
# uid=1000 gid=3000 groups=2000

# Check if the filesystem is read-only
kubectl exec secure-pod -- touch /test
# touch: /test: Read-only file system

Best Practices

  1. Always set runAsNonRoot: true to prevent containers from running as root.
  2. Set readOnlyRootFilesystem: true and mount writable emptyDir volumes where the application needs to write.
  3. Drop ALL capabilities and add back only the specific ones required.
  4. Set allowPrivilegeEscalation: false on every container.
  5. Use RuntimeDefault seccomp profile as a minimum baseline.
  6. Enforce Pod Security Standards at the namespace level using the built-in Pod Security Admission controller.

Why Interviewers Ask This

Interviewers ask this to assess your understanding of container security hardening. Misconfigured security contexts are a common attack vector, and proper configuration is essential for production clusters.

Common Follow-Up Questions

What is the difference between a Pod-level and container-level security context?
Pod-level settings apply to all containers and include fsGroup, runAsUser, and supplementalGroups. Container-level settings override Pod-level for that specific container and add capabilities and privileged mode.
What is runAsNonRoot and why is it important?
runAsNonRoot: true prevents a container from running as UID 0. If the container image's default user is root, the container will fail to start. This is a critical security control.
How do security contexts relate to Pod Security Standards?
Pod Security Standards (Restricted, Baseline, Privileged) define policies enforced by the Pod Security Admission controller. Security contexts are where you configure Pods to comply with these standards.

Key Takeaways

  • Security contexts control Linux-level security settings for containers.
  • Always run containers as non-root and with a read-only root filesystem in production.
  • Use the principle of least privilege -- drop all capabilities and add only what is needed.

Related Questions