Kubernetes Ingress 502/503/504

Causes and Fixes

502 Bad Gateway, 503 Service Unavailable, and 504 Gateway Timeout errors from a Kubernetes Ingress indicate that the ingress controller cannot successfully proxy traffic to the backend pods. These errors mean the controller received the request but failed to get a valid response from the upstream Service.

Symptoms

  • HTTP 502 Bad Gateway responses from the ingress controller
  • HTTP 503 Service Unavailable when accessing the application through Ingress
  • HTTP 504 Gateway Timeout for requests that take too long
  • Intermittent errors when backends are being rolled out or scaled
  • Errors appear at the ingress layer but direct pod access works fine

Common Causes

1
Backend pods are not ready or not running
The Service has no healthy endpoints because pods are crashing (502), all pods are unavailable (503), or pods are in the process of starting up.
2
Backend response timeout
The application takes longer to respond than the ingress controller's proxy timeout, causing 504 errors. Common with slow database queries or large file uploads.
3
Pod is terminating but still receiving traffic
During rolling updates, the ingress controller sends traffic to a pod that has already started terminating, causing 502 errors. This is a race condition between endpoint removal and pod termination.
4
Incorrect backend port configuration
The Service port or targetPort does not match what the application is listening on, causing the ingress controller to get connection refused from the backend (502).
5
Resource exhaustion on backend pods
Pods are running out of CPU, memory, or file descriptors and cannot handle requests, leading to 502 or 503 errors under load.
6
Ingress controller resource limits
The ingress controller itself is overwhelmed, running out of connections, or hitting its own CPU/memory limits, causing it to return 503 errors.

Step-by-Step Troubleshooting

502, 503, and 504 errors from an ingress controller indicate the controller received the request but could not get a valid response from the backend. Each status code points to a different type of failure. This guide covers diagnosis and resolution for all three.

1. Identify Which Error Code You Are Getting

The specific HTTP status code narrows the diagnosis.

# Test the endpoint and capture the status code
curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/

# Get more detail with verbose output
curl -v -H "Host: <hostname>" http://<ingress-ip>/
  • 502 Bad Gateway: The ingress controller connected to a backend but received an invalid response (connection reset, empty response, protocol error)
  • 503 Service Unavailable: No backends are available, or the controller is overloaded
  • 504 Gateway Timeout: The backend did not respond within the proxy timeout period

2. Check Backend Pod Health

The most common cause of all three errors is unhealthy backend pods.

# Check the Service endpoints
kubectl get endpoints <backend-service>

# Check pod statuses
kubectl get pods -l <selector> -o wide

# Check if pods are ready
kubectl get pods -l <selector> -o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,RESTARTS:.status.containerStatuses[0].restartCount

If endpoints are empty (503), pods are crashing (502), or pods are overloaded (504), address the pod-level issue first.

3. Test Direct Backend Connectivity

Bypass the Ingress to determine if the issue is with the backend or the ingress layer.

# Port-forward to the backend Service
kubectl port-forward service/<backend-service> 8080:<service-port> &

# Test the backend directly
curl -v http://localhost:8080/

# Or exec into a debug pod and test
kubectl run backend-test --image=busybox --restart=Never --rm -it -- wget -qO- --timeout=10 http://<backend-service>:<port>/

If the backend responds correctly when accessed directly but fails through the Ingress, the issue is in the ingress controller configuration or the routing between controller and backend.

4. Check Ingress Controller Logs

The ingress controller logs show detailed upstream error information.

# NGINX ingress controller
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep -E "502|503|504|upstream"

# Look for specific error patterns
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "connect() failed"
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "upstream timed out"
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "no live upstreams"

Common log patterns:

  • connect() failed (111: Connection refused) — 502, backend not listening
  • upstream timed out (110: Connection timed out) — 504, backend too slow
  • no live upstreams while connecting to upstream — 503, no healthy backends

5. Fix 502 Errors: Connection Issues

502 errors typically mean the backend connection is being refused or reset.

# Verify the backend is listening on the correct port
kubectl exec <backend-pod> -- ss -tlnp

# Check if the Service targetPort matches
kubectl get service <backend-service> -o jsonpath='TargetPort: {.spec.ports[0].targetPort}'

# Check for pod restarts that could cause brief 502s
kubectl get pods -l <selector> -w

For 502 errors during rolling updates, add a preStop lifecycle hook to delay pod termination, giving the ingress controller time to remove the endpoint from its upstream pool.

spec:
  containers:
  - name: app
    lifecycle:
      preStop:
        exec:
          command: ["sleep", "15"]
  terminationGracePeriodSeconds: 30

6. Fix 503 Errors: No Available Backends

503 means the ingress controller has no upstream backends to forward to.

# Confirm the Service has no endpoints
kubectl get endpoints <backend-service>

# Check why pods are not ready
kubectl describe pods -l <selector> | grep -A5 "Readiness"

# Scale up if needed
kubectl scale deployment <deployment-name> --replicas=3

# Wait for pods to become ready
kubectl rollout status deployment/<deployment-name>

If all pods are healthy but the ingress controller still returns 503, the controller may have a stale configuration.

# Force the ingress controller to reload
kubectl rollout restart deployment -n ingress-nginx ingress-nginx-controller

7. Fix 504 Errors: Timeout Issues

504 errors mean the backend took too long to respond.

# Check how long the backend takes to respond
kubectl exec <debug-pod> -- sh -c 'time wget -qO- http://<backend-service>:<port>/'

# Check current ingress timeout annotations
kubectl get ingress <ingress-name> -o yaml | grep -i timeout

Increase the proxy timeout via ingress annotations.

# For NGINX ingress controller
kubectl annotate ingress <ingress-name> \
  nginx.ingress.kubernetes.io/proxy-connect-timeout="60" \
  nginx.ingress.kubernetes.io/proxy-send-timeout="120" \
  nginx.ingress.kubernetes.io/proxy-read-timeout="120"

Also check if the backend is resource-constrained and taking longer than normal.

kubectl top pods -l <selector>

# Increase resource limits if the pod is being throttled
kubectl set resources deployment <deployment-name> --limits=cpu=1000m,memory=1Gi

8. Handle Rolling Update 502 Errors

Transient 502 errors during deployments are a common issue caused by the timing gap between endpoint removal and pod termination.

# Configure the deployment for zero-downtime updates
kubectl patch deployment <deployment-name> -p '{
  "spec": {
    "template": {
      "spec": {
        "terminationGracePeriodSeconds": 30,
        "containers": [{
          "name": "<container-name>",
          "lifecycle": {
            "preStop": {
              "exec": {
                "command": ["sleep", "15"]
              }
            }
          }
        }]
      }
    },
    "strategy": {
      "rollingUpdate": {
        "maxUnavailable": 0,
        "maxSurge": 1
      }
    }
  }
}'

The preStop sleep gives the ingress controller time to:

  1. Detect the pod is terminating
  2. Remove it from the upstream pool
  3. Drain existing connections

9. Check Ingress Controller Resources

If the ingress controller itself is under pressure, it can return 503 for all backends.

# Check ingress controller resource usage
kubectl top pods -n ingress-nginx

# Check for resource limits
kubectl get deployment -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.template.spec.containers[0].resources}'

# Check for connection limits
kubectl exec -n ingress-nginx <ingress-pod> -- cat /etc/nginx/nginx.conf | grep worker_connections

Scale the ingress controller or increase its resources if it is at capacity.

kubectl scale deployment -n ingress-nginx ingress-nginx-controller --replicas=3

10. Verify Resolution

After applying fixes, verify the errors are resolved.

# Test the endpoint
curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/

# Run multiple requests to check for intermittent errors
for i in $(seq 1 20); do
  code=$(curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/)
  echo "Request $i: $code"
done

# Check ingress controller logs for errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50 --since=5m | grep -E "502|503|504"

All requests should return 200 (or the expected application status code). If intermittent errors remain, check whether they correlate with pod restarts, scaling events, or resource pressure and address accordingly.

How to Explain This in an Interview

I would explain the distinction between each status code in the context of an ingress controller acting as a reverse proxy: 502 means the controller connected to a backend but received an invalid response (connection reset, protocol error), 503 means no backends are available to handle the request, and 504 means the backend did not respond within the timeout period. I'd discuss how rolling updates cause transient 502s and how to mitigate them with proper readiness probes, preStop hooks, and the pod lifecycle. I'd also cover the ingress controller's connection pooling, keepalive settings, and how to tune timeouts for different types of applications.

Prevention

  • Configure readiness probes that accurately reflect when the app can serve traffic
  • Add preStop hooks to give the ingress controller time to remove endpoints
  • Tune ingress controller proxy timeouts for your application's needs
  • Set up proper connection draining during rolling updates
  • Monitor backend response times and error rates at the ingress layer

Related Errors