Kubernetes Exit Code 137

Causes and Fixes

Exit code 137 means the container process was killed by SIGKILL (signal 9). The formula is 128 + 9 = 137. This is most commonly caused by the Linux OOM killer terminating the process for exceeding its memory limit, but it can also result from kubectl delete, preemption, or a failed liveness probe.

Symptoms

  • Pod shows OOMKilled or Error with exit code 137
  • kubectl describe pod shows exit code 137 in terminated state
  • Container may show 'OOMKilled' as the termination reason
  • Container restarts with CrashLoopBackOff after SIGKILL
  • Node dmesg may show OOM killer messages

Common Causes

1
Out of memory (OOM) kill
The container exceeded its memory limit and the kernel's OOM killer sent SIGKILL. Increase memory limits or fix the memory leak.
2
Liveness probe failure
The kubelet killed the container because the liveness probe failed consecutively. The kill is via SIGKILL after the grace period.
3
Pod eviction or preemption
The kubelet evicted the pod due to node resource pressure, or a higher-priority pod preempted it.
4
Manual kill via kubectl delete
Someone deleted the pod and the grace period expired, resulting in SIGKILL. The initial SIGTERM (143) was not handled.
5
Node shutdown
The node was shut down or restarted, and containers received SIGKILL after the shutdown grace period.

Step-by-Step Troubleshooting

1. Confirm the Exit Code and Reason

kubectl describe pod <pod-name>

Check the termination reason:

Last State:     Terminated
  Reason:       OOMKilled
  Exit Code:    137

If the reason is OOMKilled, the container exceeded its memory limit. If the reason is Error, another source sent the SIGKILL.

2. Distinguish Between OOM and Other SIGKILL Sources

| Reason | Cause | How to Verify | |--------|-------|--------------| | OOMKilled | Memory limit exceeded | kubectl describe pod shows Reason: OOMKilled | | Liveness probe | Probe failed | Events show "Killing" after "Unhealthy" | | Manual delete | kubectl delete pod | Check audit logs or user activity | | Preemption | Higher-priority pod | Events show "Preempted by" | | Node shutdown | Node going down | Node shows NotReady around the same time |

# Check events for clues
kubectl get events --field-selector involvedObject.name=<pod-name> --sort-by='.lastTimestamp'

3. Troubleshoot OOMKilled (Most Common)

Check memory usage and limits.

# Check memory limits
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].resources.limits.memory}'

# Check current memory usage (if pod is running)
kubectl top pod <pod-name> --containers

# Check node-level OOM events
NODE=$(kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}')
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "dmesg | grep -i oom | tail -20"

Fix by increasing memory limits:

resources:
  requests:
    memory: "512Mi"
  limits:
    memory: "1Gi"

4. Troubleshoot Liveness Probe Kills

If the pod events show liveness probe failures before the kill:

Warning  Unhealthy  Liveness probe failed: connection refused
Warning  Unhealthy  Liveness probe failed: connection refused
Warning  Unhealthy  Liveness probe failed: connection refused
Normal   Killing    Container app failed liveness probe, will be restarted
# Check liveness probe configuration
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[0].livenessProbe}' | jq .

Common fixes:

# Add a startup probe for slow-starting apps
startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30
  periodSeconds: 10

# Relax the liveness probe
livenessProbe:
  httpGet:
    path: /healthz
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10
  timeoutSeconds: 5
  failureThreshold: 5

5. Profile Memory Usage

For applications that are OOM-killed, understand where memory is being consumed.

Java:

# Check JVM heap settings
kubectl exec <pod-name> -- jcmd 1 VM.flags | grep -i heap

# Get heap usage
kubectl exec <pod-name> -- jcmd 1 GC.heap_info

# Configure JVM for containers
env:
  - name: JAVA_OPTS
    value: "-XX:MaxRAMPercentage=75.0 -XX:+UseContainerSupport -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"

Go:

# If pprof is enabled
kubectl port-forward <pod-name> 6060:6060
go tool pprof http://localhost:6060/debug/pprof/heap

Node.js:

env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=768"  # 75% of 1Gi limit

6. Check for Memory Leaks

If memory usage grows over time until OOM:

# Monitor memory over time
watch kubectl top pod <pod-name> --containers

# Set up a Prometheus query to track the trend
# PromQL: container_memory_working_set_bytes{pod="<pod-name>"}

If memory grows linearly, the application has a memory leak. Use language-specific profiling tools to identify it.

7. Handle SIGTERM to Avoid SIGKILL

When pods are deleted, Kubernetes sends SIGTERM first, then SIGKILL after the grace period (default 30 seconds). If your app does not handle SIGTERM, it gets SIGKILL.

# Python example
import signal
import sys

def handle_sigterm(signum, frame):
    print("Received SIGTERM, shutting down gracefully...")
    # Clean up resources
    sys.exit(0)

signal.signal(signal.SIGTERM, handle_sigterm)

Increase the grace period if your app needs more time:

spec:
  terminationGracePeriodSeconds: 60

8. Verify the Fix

# Watch the pod
kubectl get pods -w

# Monitor memory usage
kubectl top pod <pod-name> --containers

# Check no more OOM events
kubectl get events --field-selector reason=OOMKilling

# Verify restart count is stable
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].restartCount}'

The pod should run stably within its memory limits.

How to Explain This in an Interview

I would explain that 137 = 128 + 9, meaning SIGKILL, and the most common cause is OOM killing. I would distinguish between container-level OOM (the cgroup limit is exceeded, reported as OOMKilled) and node-level memory pressure (which causes pod eviction). For OOM, I would discuss right-sizing memory limits using metrics, detecting memory leaks with profiling tools, and using VPA for automated recommendations. For liveness probe kills, I would explain the difference between liveness and startup probes.

Prevention

  • Set appropriate memory limits based on profiled application usage
  • Use VPA to get memory limit recommendations
  • Handle SIGTERM gracefully so containers stop before SIGKILL
  • Use startup probes instead of relying on liveness probe initialDelaySeconds
  • Monitor container memory usage and alert at 80% of limit
  • For JVM apps, set -XX:MaxRAMPercentage=75.0

Related Errors