What causes ImageInspectError in Kubernetes?

Kubernetes ImageInspectError

Causes and Fixes

ImageInspectError occurs when the container runtime fails to inspect a container image after it has been pulled. This means the image was downloaded but the runtime cannot read its metadata, typically due to a corrupt image, incompatible image format, or a container runtime issue on the node.

Symptoms

Pod status shows ImageInspectError in kubectl get pods output
Container stays in Waiting state after image pull
kubectl describe pod shows image inspection failure events
The image appears to pull successfully but the container cannot start
Other pods on the same node may work fine with different images

Common Causes

Corrupt image layers

One or more layers of the image are corrupt on disk. This can happen due to interrupted pulls, disk errors, or storage driver issues.

Incompatible image format

The image uses a manifest format that the container runtime version does not support (e.g., OCI artifacts vs Docker manifests).

Disk space issue on the node

The node ran out of disk space during the image pull, resulting in incomplete or corrupt image data.

Container runtime bug

A bug in containerd or CRI-O causes the runtime to fail when inspecting certain images. Upgrading the runtime may resolve this.

Storage driver corruption

The container runtime's storage driver (overlayfs, devicemapper) has become corrupt, preventing image inspection.

Step-by-Step Troubleshooting

1. Identify the Affected Pod and Node

kubectl describe pod <pod-name> -n <namespace>

Note the node the pod is scheduled on and the exact error message:

kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}'

2. Check if the Issue is Node-Specific

Test whether the same image works on a different node.

# Run a test pod with the same image
kubectl run test-image --image=<image> --restart=Never --command -- sleep 60

# Check if it starts successfully
kubectl get pod test-image -o wide

# If it works, the issue is specific to the original node

3. Check Node Disk Space

Insufficient disk space is a common cause of image corruption.

# Check node conditions
kubectl describe node <node-name> | grep -A5 Conditions

# Check disk usage from the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "df -h"

# Check container runtime storage
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "du -sh /var/lib/containerd/"

If disk space is low, trigger garbage collection:

# Remove unused images on the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rmi --prune"

4. Check Container Runtime Logs

The runtime logs will have more details about why the inspection failed.

# Find the node
NODE=$(kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}')

# Check containerd logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u containerd --since '30 minutes ago' | grep -i 'inspect\|error\|corrupt'"

# Check kubelet logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u kubelet --since '30 minutes ago' | grep -i 'inspect\|image'"

5. Remove and Re-Pull the Image

If the image is corrupt on the node, remove it and let Kubernetes re-pull.

# List images on the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl images | grep <image-name>"

# Remove the corrupt image
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rmi <image-id>"

# Delete the pod to trigger a fresh pull
kubectl delete pod <pod-name>

6. Verify Image Integrity

Check whether the image itself is valid by pulling and inspecting it from a different machine.

# Pull the image locally
docker pull <image>

# Inspect the image
docker inspect <image>

# Verify the image manifest
crane manifest <image> | jq .

# Check image layers
crane ls <image-repo>

If the image is corrupt in the registry, rebuild and push it.

7. Check Container Runtime Version

Ensure the container runtime is up to date and supports the image format.

# Check runtime version on the node
kubectl get node <node-name> -o jsonpath='{.status.nodeInfo.containerRuntimeVersion}'

# Check all nodes
kubectl get nodes -o custom-columns='NODE:.metadata.name,RUNTIME:.status.nodeInfo.containerRuntimeVersion'

If the runtime is outdated, plan an upgrade. Check the runtime's release notes for known image inspection bugs.

8. Check Storage Driver Health

If the container runtime's storage driver is corrupt, multiple images may be affected.

# Check storage driver info
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl info | grep -A10 storage"

# Check for filesystem errors
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "dmesg | grep -i 'error\|corrupt\|ext4\|xfs'"

If the storage driver is corrupt, the node may need to be drained and its container storage directory cleaned:

# Drain the node first
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# Then clean container storage on the node (disruptive)
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "systemctl stop containerd && rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/* && systemctl start containerd"

# Uncordon the node
kubectl uncordon <node-name>

9. Verify the Fix

# Watch the pod restart
kubectl get pods -n <namespace> -w

# Verify the image was inspected successfully
kubectl describe pod <pod-name> | grep -E "Pulled|Started"

# Confirm the container is running
kubectl get pod <pod-name>

The pod should transition to Running without ImageInspectError.

How to Explain This in an Interview

I would explain that ImageInspectError is relatively rare and occurs after a successful pull but before the container can be created. The runtime needs to inspect the image to extract metadata like the entrypoint, environment variables, and exposed ports. When this fails, it usually indicates a node-level issue rather than a cluster-level configuration problem. I would debug by checking the image on other nodes, verifying disk space, examining runtime logs, and if needed, removing and re-pulling the image on the affected node.

Prevention

Monitor node disk space and set up alerts before thresholds are reached
Use image digest pinning to ensure image integrity
Keep container runtimes updated to the latest stable version
Run node-problem-detector to catch storage and runtime issues early
Set up garbage collection policies to prevent disk space exhaustion