What causes ErrImagePull in Kubernetes?

Kubernetes ErrImagePull

Causes and Fixes

ErrImagePull indicates the kubelet's first attempt to pull a container image has failed. If the pull continues to fail, Kubernetes transitions the pod to ImagePullBackOff. This error surfaces immediately and points to issues with the image reference, registry credentials, or network access.

Symptoms

Pod status shows ErrImagePull in kubectl get pods output
kubectl describe pod shows 'Failed to pull image' in events
Pod quickly transitions from ErrImagePull to ImagePullBackOff on repeated failures
Container never starts and restart count stays at zero

Common Causes

Image does not exist in the registry

The image name or tag is incorrect or was never pushed. Confirm the image exists by querying the registry directly.

Authentication failure for private registry

The imagePullSecret is missing, misconfigured, or contains expired credentials. Recreate the secret with valid tokens.

Registry DNS resolution failure

The node cannot resolve the registry hostname. Check CoreDNS logs and node DNS configuration.

TLS certificate error

The registry uses a self-signed certificate that the container runtime does not trust. Add the CA certificate to the runtime's trust store.

Image platform mismatch

The image was built for a different architecture (e.g., amd64 image on an arm64 node). Use multi-arch images or node selectors.

Step-by-Step Troubleshooting

1. Get the Error Details

Start by identifying which pod is affected and what image it is trying to pull.

kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

In the Events section, look for the specific error message. Common patterns include:

Failed to pull image "nginx:latst": rpc error: code = NotFound desc = failed to pull and unpack image
Failed to pull image "private.registry.io/app:v1": unexpected status code 401 Unauthorized

The error message is your most important clue for identifying the root cause.

2. Validate the Image Reference

Check for typos in the image name, tag, or registry URL.

# See the exact image reference
kubectl get pod <pod-name> -o jsonpath='{.spec.containers[*].image}'

# Verify it exists (from a machine with registry access)
docker manifest inspect nginx:latst
crane digest private.registry.io/app:v1

Common image reference mistakes:

Typos in the image name or tag (nginx:latst instead of nginx:latest)
Missing registry prefix for private images
Wrong port for a private registry (registry:5000 vs registry:443)

3. Test Registry Authentication

If the image is in a private registry, verify authentication.

# Check if imagePullSecrets are configured
kubectl get pod <pod-name> -o jsonpath='{.spec.imagePullSecrets[*].name}'

# Verify the secret exists and is valid
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data.\.dockerconfigjson}' | base64 -d | jq .

# Test authentication from your machine
docker login private.registry.io
docker pull private.registry.io/app:v1

If the secret is missing or expired, create a new one:

kubectl create secret docker-registry my-registry-cred \
  --docker-server=private.registry.io \
  --docker-username=<user> \
  --docker-password=<token> \
  -n <namespace>

4. Check TLS and Certificate Issues

If your registry uses self-signed certificates, the container runtime may reject the connection.

# Debug from a node
kubectl debug node/<node-name> -it --image=alpine -- sh
apk add curl
curl -v https://private.registry.io/v2/

For containerd, add the registry's CA certificate to the node's trust store or configure it in /etc/containerd/certs.d/:

# /etc/containerd/certs.d/private.registry.io/hosts.toml
[host."https://private.registry.io"]
  ca = "/etc/containerd/certs.d/private.registry.io/ca.crt"

5. Verify Network Connectivity from the Node

The node running the pod must be able to reach the registry.

# Check which node the pod is scheduled on
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}'

# Debug from that node
kubectl debug node/<node-name> -it --image=busybox -- sh

# Test DNS
nslookup private.registry.io

# Test HTTPS connectivity
wget -O /dev/null https://private.registry.io/v2/

Check for:

NetworkPolicies blocking egress from the node
Firewall rules preventing outbound HTTPS (port 443)
Proxy settings required but not configured on the container runtime

6. Check for Architecture Mismatches

If you run a mixed-architecture cluster (amd64 and arm64 nodes), ensure the image supports the target platform.

# Check the node architecture
kubectl get node <node-name> -o jsonpath='{.status.nodeInfo.architecture}'

# Check image platforms
docker manifest inspect --verbose nginx:latest | jq '.[].Platform'
crane manifest nginx:latest | jq '.manifests[].platform'

If the image does not support the node architecture, either build a multi-arch image or use a nodeSelector to pin the pod to a compatible node.

7. Check Container Runtime Logs

If the above steps do not reveal the issue, check the container runtime logs on the node.

# For containerd
journalctl -u containerd --since "10 minutes ago" | grep -i "pull\|error"

# For CRI-O
journalctl -u crio --since "10 minutes ago" | grep -i "pull\|error"

8. Apply the Fix

Once you identify the root cause, apply the appropriate fix:

# Fix image reference
kubectl set image deployment/<name> <container>=correct-image:tag

# Add imagePullSecrets
kubectl patch pod <pod-name> -p '{"spec":{"imagePullSecrets":[{"name":"my-cred"}]}}'

# For deployments, update the template
kubectl edit deployment <name>

9. Confirm Resolution

# Watch the pod transition to Running
kubectl get pods -w

# Verify the image was pulled
kubectl describe pod <pod-name> | grep "Successfully pulled"

The pod events should show Successfully pulled image and the status should move to Running.

How to Explain This in an Interview

I would explain that ErrImagePull is the initial failure state before ImagePullBackOff kicks in. They represent the same underlying problem — a failed image pull — but at different stages of the retry cycle. I would describe my approach: first verify the image reference and tag, then check credentials and network, and finally look at runtime-level issues like TLS trust. In production, I prevent this by pinning digests, using admission webhooks to validate images, and pre-pulling critical images.

Prevention

Validate image references in CI/CD pipelines before deploying
Use image digest pinning instead of mutable tags
Configure imagePullSecrets on ServiceAccounts for private registries
Set up registry mirrors or caches for reliability
Use admission controllers like Kyverno to enforce image policies