How Do You Troubleshoot RBAC Permission Denials in Kubernetes?

intermediate|rbacdevopssrecloud architectCKACKS
TL;DR

RBAC troubleshooting follows a systematic approach: verify the denial with kubectl auth can-i, check existing bindings and roles, inspect audit logs for the exact request, and fix the gap by creating or updating the appropriate Role and Binding.

Detailed Answer

When a user or ServiceAccount gets a 403 Forbidden response from the Kubernetes API server, the cause is almost always a missing or misconfigured RBAC binding. Here is a systematic approach to diagnose and fix the issue.

Step 1: Reproduce and Confirm the Denial

# Confirm the specific denial
kubectl auth can-i create deployments --as=jane -n production
# no

# Get more detail — list all permissions the subject has
kubectl auth can-i --list --as=jane -n production

If --list shows the permission you expect, the issue may not be RBAC. Check admission controllers or webhook configurations.

Step 2: Identify the Subject's Identity

A common cause of RBAC failures is an identity mismatch. The user or ServiceAccount name in the binding does not match what the API server sees.

# Check your current identity
kubectl auth whoami  # Kubernetes 1.27+

# For older versions, check the kubeconfig context
kubectl config view --minify -o jsonpath='{.contexts[0].context.user}'

# For ServiceAccounts, check the Pod spec
kubectl get pod my-pod -n production \
  -o jsonpath='{.spec.serviceAccountName}'

For certificate-based auth, the username is the certificate's Common Name (CN) and groups come from the Organization (O) field:

# Inspect a client certificate
openssl x509 -in client.crt -noout -subject
# subject=O = dev-team, CN = jane

If the certificate says CN = jane but the RoleBinding references jane.doe, the binding will not match.

Step 3: Check Existing Bindings

# Check namespace-scoped bindings
kubectl get rolebindings -n production -o wide

# Check cluster-scoped bindings
kubectl get clusterrolebindings -o wide | grep jane

# Detailed view of a specific binding
kubectl describe rolebinding dev-access -n production

Look for:

  • Subject mismatch: Does the binding reference the correct user/group/SA name?
  • Namespace mismatch: Is the RoleBinding in the correct namespace?
  • Role reference: Does the roleRef point to a Role that actually exists?

Step 4: Inspect the Referenced Role

# Check the Role's rules
kubectl describe role deployment-manager -n production

# Or for ClusterRoles
kubectl describe clusterrole deployment-manager

Verify that the Role includes:

  • The correct apiGroup (empty string "" for core resources, apps for Deployments, etc.)
  • The correct resource name (including subresources like pods/log)
  • The correct verbs (get, list, create, update, patch, delete, watch)

Step 5: Check Audit Logs

API audit logs provide the definitive record. Look for 403 responses:

# On a kubeadm cluster, audit logs are typically at:
# /var/log/kubernetes/audit/audit.log

# Search for denials for a specific user
grep '"user":{"username":"jane"' /var/log/kubernetes/audit/audit.log | \
  grep '"code":403'

A typical audit log entry for a denial:

{
  "kind": "Event",
  "apiVersion": "audit.k8s.io/v1",
  "level": "Metadata",
  "stage": "ResponseComplete",
  "requestURI": "/api/v1/namespaces/production/secrets",
  "verb": "list",
  "user": {
    "username": "jane",
    "groups": ["dev-team", "system:authenticated"]
  },
  "responseStatus": {
    "code": 403,
    "reason": "Forbidden"
  },
  "objectRef": {
    "resource": "secrets",
    "namespace": "production",
    "apiGroup": "",
    "apiVersion": "v1"
  }
}

This tells you exactly: user jane tried to list secrets in production and was denied.

Common Root Causes and Fixes

1. Wrong apiGroup in the Role

# WRONG — Deployments are in the "apps" group, not core
rules:
  - apiGroups: [""]
    resources: ["deployments"]
    verbs: ["get", "list"]

# CORRECT
rules:
  - apiGroups: ["apps"]
    resources: ["deployments"]
    verbs: ["get", "list"]

2. Missing subresource

# User can get pods but cannot view logs
# MISSING: pods/log subresource
rules:
  - apiGroups: [""]
    resources: ["pods"]
    verbs: ["get", "list"]

# FIX: add the subresource
rules:
  - apiGroups: [""]
    resources: ["pods", "pods/log"]
    verbs: ["get", "list"]

3. RoleBinding in the wrong namespace

# The user needs access to 'production' but the binding is in 'staging'
kubectl get rolebinding dev-access -n staging
# Found! But it's in the wrong namespace.

# Fix: create the binding in the correct namespace
kubectl create rolebinding dev-access \
  --role=developer-role \
  --user=jane \
  -n production

4. Role does not exist

# The binding references a Role that was deleted or never created
kubectl describe rolebinding dev-access -n production
# Role: developer-role
kubectl get role developer-role -n production
# Error from server (NotFound)

5. ServiceAccount namespace mismatch in binding subjects

# WRONG — namespace field missing or incorrect
subjects:
  - kind: ServiceAccount
    name: deployer

# CORRECT — namespace is required for ServiceAccount subjects
subjects:
  - kind: ServiceAccount
    name: deployer
    namespace: production

Troubleshooting Decision Tree

403 Forbidden
├── kubectl auth can-i confirms denial?
│   ├── YES → RBAC issue
│   │   ├── Binding exists?
│   │   │   ├── NO → Create the binding
│   │   │   └── YES → Check:
│   │   │       ├── Subject name matches identity?
│   │   │       ├── Binding in correct namespace?
│   │   │       ├── Referenced Role exists?
│   │   │       ├── Role has correct apiGroup?
│   │   │       ├── Role has correct resource (incl. subresource)?
│   │   │       └── Role has correct verb?
│   │   └── Fix the identified gap
│   └── NO (can-i says "yes" but request fails)
│       ├── Check admission webhooks
│       ├── Check PodSecurityAdmission
│       └── Check OPA/Gatekeeper policies

Useful Diagnostic Commands Reference

# Full diagnostic for a subject in a namespace
SUBJECT="jane"
NS="production"

echo "=== Permissions ==="
kubectl auth can-i --list --as="$SUBJECT" -n "$NS"

echo "=== RoleBindings ==="
kubectl get rolebindings -n "$NS" -o json | \
  jq -r ".items[] | select(.subjects[]?.name==\"$SUBJECT\") | .metadata.name"

echo "=== ClusterRoleBindings ==="
kubectl get clusterrolebindings -o json | \
  jq -r ".items[] | select(.subjects[]?.name==\"$SUBJECT\") | .metadata.name"

Why Interviewers Ask This

RBAC permission errors are among the most common issues in Kubernetes. Interviewers want to see a structured debugging methodology, not guesswork. This question separates operators who have real cluster experience from those who only know theory.

Common Follow-Up Questions

How do you read Kubernetes audit logs for RBAC denials?
Look for entries with responseStatus.code 403 in the audit log. The log includes the user, verb, resource, and namespace, which tells you exactly what permission is missing.
What could cause a permission denial even if the RBAC binding looks correct?
Possible causes include: the user's identity does not match the binding subject (e.g., certificate CN mismatch), the binding is in the wrong namespace, a webhook admission controller is denying the request, or the API group in the Role is incorrect.
How do you debug RBAC for a ServiceAccount running inside a Pod?
Use kubectl auth can-i with --as=system:serviceaccount:<namespace>:<name> to simulate the ServiceAccount's permissions. Also verify the Pod spec has the correct serviceAccountName and that the token is mounted.

Key Takeaways

  • Always start with kubectl auth can-i to confirm the denial.
  • Check both RoleBindings and ClusterRoleBindings — permissions can come from either.
  • API audit logs provide the definitive record of what was denied and why.
  • Common causes include wrong namespace, wrong apiGroup, missing subresource, and identity mismatch.

Related Questions