Kubernetes Ingress 502/503/504
Causes and Fixes
502 Bad Gateway, 503 Service Unavailable, and 504 Gateway Timeout errors from a Kubernetes Ingress indicate that the ingress controller cannot successfully proxy traffic to the backend pods. These errors mean the controller received the request but failed to get a valid response from the upstream Service.
Symptoms
- HTTP 502 Bad Gateway responses from the ingress controller
- HTTP 503 Service Unavailable when accessing the application through Ingress
- HTTP 504 Gateway Timeout for requests that take too long
- Intermittent errors when backends are being rolled out or scaled
- Errors appear at the ingress layer but direct pod access works fine
Common Causes
Step-by-Step Troubleshooting
502, 503, and 504 errors from an ingress controller indicate the controller received the request but could not get a valid response from the backend. Each status code points to a different type of failure. This guide covers diagnosis and resolution for all three.
1. Identify Which Error Code You Are Getting
The specific HTTP status code narrows the diagnosis.
# Test the endpoint and capture the status code
curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/
# Get more detail with verbose output
curl -v -H "Host: <hostname>" http://<ingress-ip>/
- 502 Bad Gateway: The ingress controller connected to a backend but received an invalid response (connection reset, empty response, protocol error)
- 503 Service Unavailable: No backends are available, or the controller is overloaded
- 504 Gateway Timeout: The backend did not respond within the proxy timeout period
2. Check Backend Pod Health
The most common cause of all three errors is unhealthy backend pods.
# Check the Service endpoints
kubectl get endpoints <backend-service>
# Check pod statuses
kubectl get pods -l <selector> -o wide
# Check if pods are ready
kubectl get pods -l <selector> -o custom-columns=NAME:.metadata.name,READY:.status.conditions[?(@.type=="Ready")].status,RESTARTS:.status.containerStatuses[0].restartCount
If endpoints are empty (503), pods are crashing (502), or pods are overloaded (504), address the pod-level issue first.
3. Test Direct Backend Connectivity
Bypass the Ingress to determine if the issue is with the backend or the ingress layer.
# Port-forward to the backend Service
kubectl port-forward service/<backend-service> 8080:<service-port> &
# Test the backend directly
curl -v http://localhost:8080/
# Or exec into a debug pod and test
kubectl run backend-test --image=busybox --restart=Never --rm -it -- wget -qO- --timeout=10 http://<backend-service>:<port>/
If the backend responds correctly when accessed directly but fails through the Ingress, the issue is in the ingress controller configuration or the routing between controller and backend.
4. Check Ingress Controller Logs
The ingress controller logs show detailed upstream error information.
# NGINX ingress controller
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep -E "502|503|504|upstream"
# Look for specific error patterns
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "connect() failed"
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "upstream timed out"
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=200 | grep "no live upstreams"
Common log patterns:
connect() failed (111: Connection refused)— 502, backend not listeningupstream timed out (110: Connection timed out)— 504, backend too slowno live upstreams while connecting to upstream— 503, no healthy backends
5. Fix 502 Errors: Connection Issues
502 errors typically mean the backend connection is being refused or reset.
# Verify the backend is listening on the correct port
kubectl exec <backend-pod> -- ss -tlnp
# Check if the Service targetPort matches
kubectl get service <backend-service> -o jsonpath='TargetPort: {.spec.ports[0].targetPort}'
# Check for pod restarts that could cause brief 502s
kubectl get pods -l <selector> -w
For 502 errors during rolling updates, add a preStop lifecycle hook to delay pod termination, giving the ingress controller time to remove the endpoint from its upstream pool.
spec:
containers:
- name: app
lifecycle:
preStop:
exec:
command: ["sleep", "15"]
terminationGracePeriodSeconds: 30
6. Fix 503 Errors: No Available Backends
503 means the ingress controller has no upstream backends to forward to.
# Confirm the Service has no endpoints
kubectl get endpoints <backend-service>
# Check why pods are not ready
kubectl describe pods -l <selector> | grep -A5 "Readiness"
# Scale up if needed
kubectl scale deployment <deployment-name> --replicas=3
# Wait for pods to become ready
kubectl rollout status deployment/<deployment-name>
If all pods are healthy but the ingress controller still returns 503, the controller may have a stale configuration.
# Force the ingress controller to reload
kubectl rollout restart deployment -n ingress-nginx ingress-nginx-controller
7. Fix 504 Errors: Timeout Issues
504 errors mean the backend took too long to respond.
# Check how long the backend takes to respond
kubectl exec <debug-pod> -- sh -c 'time wget -qO- http://<backend-service>:<port>/'
# Check current ingress timeout annotations
kubectl get ingress <ingress-name> -o yaml | grep -i timeout
Increase the proxy timeout via ingress annotations.
# For NGINX ingress controller
kubectl annotate ingress <ingress-name> \
nginx.ingress.kubernetes.io/proxy-connect-timeout="60" \
nginx.ingress.kubernetes.io/proxy-send-timeout="120" \
nginx.ingress.kubernetes.io/proxy-read-timeout="120"
Also check if the backend is resource-constrained and taking longer than normal.
kubectl top pods -l <selector>
# Increase resource limits if the pod is being throttled
kubectl set resources deployment <deployment-name> --limits=cpu=1000m,memory=1Gi
8. Handle Rolling Update 502 Errors
Transient 502 errors during deployments are a common issue caused by the timing gap between endpoint removal and pod termination.
# Configure the deployment for zero-downtime updates
kubectl patch deployment <deployment-name> -p '{
"spec": {
"template": {
"spec": {
"terminationGracePeriodSeconds": 30,
"containers": [{
"name": "<container-name>",
"lifecycle": {
"preStop": {
"exec": {
"command": ["sleep", "15"]
}
}
}
}]
}
},
"strategy": {
"rollingUpdate": {
"maxUnavailable": 0,
"maxSurge": 1
}
}
}
}'
The preStop sleep gives the ingress controller time to:
- Detect the pod is terminating
- Remove it from the upstream pool
- Drain existing connections
9. Check Ingress Controller Resources
If the ingress controller itself is under pressure, it can return 503 for all backends.
# Check ingress controller resource usage
kubectl top pods -n ingress-nginx
# Check for resource limits
kubectl get deployment -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.template.spec.containers[0].resources}'
# Check for connection limits
kubectl exec -n ingress-nginx <ingress-pod> -- cat /etc/nginx/nginx.conf | grep worker_connections
Scale the ingress controller or increase its resources if it is at capacity.
kubectl scale deployment -n ingress-nginx ingress-nginx-controller --replicas=3
10. Verify Resolution
After applying fixes, verify the errors are resolved.
# Test the endpoint
curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/
# Run multiple requests to check for intermittent errors
for i in $(seq 1 20); do
code=$(curl -s -o /dev/null -w "%{http_code}" -H "Host: <hostname>" http://<ingress-ip>/)
echo "Request $i: $code"
done
# Check ingress controller logs for errors
kubectl logs -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx --tail=50 --since=5m | grep -E "502|503|504"
All requests should return 200 (or the expected application status code). If intermittent errors remain, check whether they correlate with pod restarts, scaling events, or resource pressure and address accordingly.
How to Explain This in an Interview
I would explain the distinction between each status code in the context of an ingress controller acting as a reverse proxy: 502 means the controller connected to a backend but received an invalid response (connection reset, protocol error), 503 means no backends are available to handle the request, and 504 means the backend did not respond within the timeout period. I'd discuss how rolling updates cause transient 502s and how to mitigate them with proper readiness probes, preStop hooks, and the pod lifecycle. I'd also cover the ingress controller's connection pooling, keepalive settings, and how to tune timeouts for different types of applications.
Prevention
- Configure readiness probes that accurately reflect when the app can serve traffic
- Add preStop hooks to give the ingress controller time to remove endpoints
- Tune ingress controller proxy timeouts for your application's needs
- Set up proper connection draining during rolling updates
- Monitor backend response times and error rates at the ingress layer