What causes Exit Code 128 in Kubernetes?

Kubernetes Exit Code 128

Causes and Fixes

Exit code 128 indicates an invalid exit code was returned by the container process, or the container runtime encountered a fatal error during execution. In practice, exit codes above 128 usually mean the process was killed by a signal (exit code = 128 + signal number), but exit code 128 itself suggests the runtime could not determine the actual exit status.

Symptoms

Pod shows Error or CrashLoopBackOff with exit code 128
kubectl describe pod shows exit code 128 in terminated state
Container exits immediately or after a short run
Container logs may be empty or incomplete
May appear intermittently on specific nodes

Common Causes

Invalid argument to exit()

The application called exit() with an invalid or negative value that the shell maps to 128. Some languages wrap exit codes modulo 256.

Container runtime failure

The container runtime (containerd/CRI-O) encountered an internal error while managing the container process.

Docker/containerd exec failure

The runtime failed to exec the container's entrypoint. This differs from exit code 127 (command not found) in that the exec syscall itself failed.

PID namespace issue

A conflict in PID namespace setup prevented the container from starting properly.

Kernel cgroup issue

The Linux kernel encountered an error setting up cgroups for the container, preventing it from running.

Step-by-Step Troubleshooting

1. Confirm the Exit Code

kubectl describe pod <pod-name>

Look for:

Last State:     Terminated
  Reason:       Error
  Exit Code:    128

2. Check Container Logs

kubectl logs <pod-name> --previous

Exit code 128 often produces empty or minimal logs because the failure happens at the runtime level before the application can log anything.

3. Determine if the Issue is Node-Specific

# Check which node the pod was on
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}'

# Try running the same pod on a different node
kubectl run test-pod --image=<same-image> --restart=Never --command -- sleep 60
kubectl get pod test-pod -o wide

If the pod works on another node, the issue is node-specific.

4. Check Container Runtime Logs

NODE=$(kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}')

# Check containerd logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u containerd --since '30 minutes ago' | grep -i 'error\|exit\|kill'"

# Check kubelet logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u kubelet --since '30 minutes ago' | grep -i 'exit\|error\|128'"

5. Check Kernel Logs

kubectl debug node/$NODE -it --image=ubuntu -- bash -c "dmesg | tail -100"

# Look for cgroup errors
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "dmesg | grep -i 'cgroup\|oom\|error'"

6. Check if the Application Returns Invalid Exit Codes

Some programming languages have quirks with exit codes:

# Python: exit codes are modulo 256
import sys
sys.exit(-1)  # Becomes exit code 255
sys.exit(256)  # Becomes exit code 0

// Go: os.Exit only accepts 0-125 conventionally
os.Exit(128)  // This itself produces exit code 128

If the application explicitly calls exit(128), it might be a programming error. Check the application source code for explicit exit calls.

7. Check Runtime Version and Known Issues

# Check container runtime version
kubectl get node <node-name> -o jsonpath='{.status.nodeInfo.containerRuntimeVersion}'

# Check kubelet version
kubectl get node <node-name> -o jsonpath='{.status.nodeInfo.kubeletVersion}'

Search the container runtime's issue tracker for known bugs related to exit code 128 with your specific runtime version.

8. Check Resource Constraints

Extreme resource constraints can cause the runtime to fail during container setup.

# Check node resources
kubectl describe node <node-name> | grep -A10 "Allocated resources"

# Check if the node is under pressure
kubectl describe node <node-name> | grep -A5 "Conditions"

# Check pod resource requests
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources}'

9. Test the Container Locally

# Run the container locally
docker run --rm <image>
echo "Exit code: $?"

# Run with resource constraints similar to the pod
docker run --rm --memory=256m --cpus=0.5 <image>
echo "Exit code: $?"

If the container works locally but fails in Kubernetes, the issue is likely environment-specific (node resources, runtime config, kernel parameters).

10. Restart the Runtime (Last Resort)

If the runtime is in a bad state on the node:

# Drain the node first
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Restart the runtime
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "systemctl restart containerd"

# Uncordon the node
kubectl uncordon <node-name>

11. Verify the Fix

# Delete the failing pod
kubectl delete pod <pod-name>

# Watch for the replacement
kubectl get pods -w

# Check the exit code
kubectl describe pod <pod-name> | grep "Exit Code"

The new pod should start and remain running without exit code 128.

How to Explain This in an Interview

I would explain that exit code 128 is unusual and often indicates a runtime-level problem rather than an application error. The convention is that exit codes 128+N mean the process was killed by signal N (so 137 = 128+9 = SIGKILL, 143 = 128+15 = SIGTERM). Exit code 128 itself means either the application passed an invalid value to exit() or the runtime could not determine the exit status. I would debug by checking runtime logs on the node, looking at dmesg for kernel errors, and testing the container on a different node.

Prevention

Ensure applications use valid exit codes (0-125)
Keep container runtimes updated to latest stable versions
Monitor node health with node-problem-detector
Test containers thoroughly before deployment
Set up kernel parameter monitoring for cgroup issues