Kubernetes ImageInspectError
Causes and Fixes
ImageInspectError occurs when the container runtime fails to inspect a container image after it has been pulled. This means the image was downloaded but the runtime cannot read its metadata, typically due to a corrupt image, incompatible image format, or a container runtime issue on the node.
Symptoms
- Pod status shows ImageInspectError in kubectl get pods output
- Container stays in Waiting state after image pull
- kubectl describe pod shows image inspection failure events
- The image appears to pull successfully but the container cannot start
- Other pods on the same node may work fine with different images
Common Causes
Step-by-Step Troubleshooting
1. Identify the Affected Pod and Node
kubectl describe pod <pod-name> -n <namespace>
Note the node the pod is scheduled on and the exact error message:
kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}'
2. Check if the Issue is Node-Specific
Test whether the same image works on a different node.
# Run a test pod with the same image
kubectl run test-image --image=<image> --restart=Never --command -- sleep 60
# Check if it starts successfully
kubectl get pod test-image -o wide
# If it works, the issue is specific to the original node
3. Check Node Disk Space
Insufficient disk space is a common cause of image corruption.
# Check node conditions
kubectl describe node <node-name> | grep -A5 Conditions
# Check disk usage from the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "df -h"
# Check container runtime storage
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "du -sh /var/lib/containerd/"
If disk space is low, trigger garbage collection:
# Remove unused images on the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rmi --prune"
4. Check Container Runtime Logs
The runtime logs will have more details about why the inspection failed.
# Find the node
NODE=$(kubectl get pod <pod-name> -o jsonpath='{.spec.nodeName}')
# Check containerd logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u containerd --since '30 minutes ago' | grep -i 'inspect\|error\|corrupt'"
# Check kubelet logs
kubectl debug node/$NODE -it --image=ubuntu -- bash -c "journalctl -u kubelet --since '30 minutes ago' | grep -i 'inspect\|image'"
5. Remove and Re-Pull the Image
If the image is corrupt on the node, remove it and let Kubernetes re-pull.
# List images on the node
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl images | grep <image-name>"
# Remove the corrupt image
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl rmi <image-id>"
# Delete the pod to trigger a fresh pull
kubectl delete pod <pod-name>
6. Verify Image Integrity
Check whether the image itself is valid by pulling and inspecting it from a different machine.
# Pull the image locally
docker pull <image>
# Inspect the image
docker inspect <image>
# Verify the image manifest
crane manifest <image> | jq .
# Check image layers
crane ls <image-repo>
If the image is corrupt in the registry, rebuild and push it.
7. Check Container Runtime Version
Ensure the container runtime is up to date and supports the image format.
# Check runtime version on the node
kubectl get node <node-name> -o jsonpath='{.status.nodeInfo.containerRuntimeVersion}'
# Check all nodes
kubectl get nodes -o custom-columns='NODE:.metadata.name,RUNTIME:.status.nodeInfo.containerRuntimeVersion'
If the runtime is outdated, plan an upgrade. Check the runtime's release notes for known image inspection bugs.
8. Check Storage Driver Health
If the container runtime's storage driver is corrupt, multiple images may be affected.
# Check storage driver info
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "crictl info | grep -A10 storage"
# Check for filesystem errors
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "dmesg | grep -i 'error\|corrupt\|ext4\|xfs'"
If the storage driver is corrupt, the node may need to be drained and its container storage directory cleaned:
# Drain the node first
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# Then clean container storage on the node (disruptive)
kubectl debug node/<node-name> -it --image=ubuntu -- bash -c "systemctl stop containerd && rm -rf /var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/* && systemctl start containerd"
# Uncordon the node
kubectl uncordon <node-name>
9. Verify the Fix
# Watch the pod restart
kubectl get pods -n <namespace> -w
# Verify the image was inspected successfully
kubectl describe pod <pod-name> | grep -E "Pulled|Started"
# Confirm the container is running
kubectl get pod <pod-name>
The pod should transition to Running without ImageInspectError.
How to Explain This in an Interview
I would explain that ImageInspectError is relatively rare and occurs after a successful pull but before the container can be created. The runtime needs to inspect the image to extract metadata like the entrypoint, environment variables, and exposed ports. When this fails, it usually indicates a node-level issue rather than a cluster-level configuration problem. I would debug by checking the image on other nodes, verifying disk space, examining runtime logs, and if needed, removing and re-pulling the image on the affected node.
Prevention
- Monitor node disk space and set up alerts before thresholds are reached
- Use image digest pinning to ensure image integrity
- Keep container runtimes updated to the latest stable version
- Run node-problem-detector to catch storage and runtime issues early
- Set up garbage collection policies to prevent disk space exhaustion