Kubernetes FailedAttachVolume
Causes and Fixes
FailedAttachVolume is a warning event indicating that a volume could not be attached to the node where a pod is scheduled. This is common with cloud block storage (EBS, PD, Azure Disk) and occurs when the volume attachment operation fails at the infrastructure level, preventing the pod from starting.
Symptoms
- Pod events show 'FailedAttachVolume' warning
- Pod stuck in ContainerCreating state
- Events show 'AttachVolume.Attach failed' with a provider-specific error
- VolumeAttachment objects show detach/attach errors
- Pod cannot start because the volume is not available on the node
Common Causes
Step-by-Step Troubleshooting
FailedAttachVolume errors prevent pods from starting because the required storage volume cannot be attached to the node. This guide covers diagnosis across cloud providers and resolution strategies.
1. Check Pod Events for the Specific Error
Start by examining the pod's events for the detailed error message.
kubectl describe pod <pod-name>
Look at the Events section for entries like:
FailedAttachVolume: Multi-Attach error for volume "pvc-xxxxx"AttachVolume.Attach failed: ... volume is already attached to an instanceAttachVolume.Attach failed: ... Maximum number of volumes reached
The error message from the cloud provider is embedded in the event and is the most important diagnostic clue.
2. Check the VolumeAttachment Resource
Kubernetes tracks volume attachments as API resources.
# List all VolumeAttachments
kubectl get volumeattachment
# Find attachments for the specific PV
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PV:.spec.source.persistentVolumeName,NODE:.spec.nodeName,ATTACHED:.status.attached
If a VolumeAttachment shows the volume attached to a different node than where the pod is scheduled, or if it shows attached: false with errors, you have found the problem.
# Get detailed attachment status
kubectl describe volumeattachment <attachment-name>
3. Check Where the Volume Is Currently Attached
If the volume is attached to another node, identify why it has not been detached.
# Check the PV to find the volume ID
kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeHandle}'
# or for in-tree plugins
kubectl get pv <pv-name> -o jsonpath='{.spec.awsElasticBlockStore.volumeID}'
# Check which node has the volume
kubectl get volumeattachment -o json | jq '.items[] | select(.spec.source.persistentVolumeName=="<pv-name>") | {node: .spec.nodeName, attached: .status.attached}'
The volume may be stuck on a node that is NotReady, terminated, or had a pod that terminated ungracefully.
# Check the node status
kubectl get node <attached-node>
4. Check Node Volume Limits
Nodes have maximum volume attachment limits.
# Check how many volumes are attached to the target node
kubectl get csinode <node-name> -o yaml
# Check node allocatable for volume count
kubectl get node <node-name> -o jsonpath='{.status.allocatable}' | jq .
# Count current volume attachments for the node
kubectl get volumeattachment -o json | jq '[.items[] | select(.spec.nodeName=="<node-name>")] | length'
AWS EC2 volume limits vary by instance type (typically 25-39 EBS volumes per instance). If you are at the limit, you need to move pods to other nodes or use instance types with higher limits.
5. Check Zone Compatibility
Verify the volume and node are in the same availability zone.
# Check the volume's zone (from the PV)
kubectl get pv <pv-name> -o jsonpath='{.spec.nodeAffinity}'
# Check the node's zone
kubectl get node <node-name> -L topology.kubernetes.io/zone
# Check if the PV has topology constraints
kubectl get pv <pv-name> -o yaml | grep -A10 nodeAffinity
If there is a zone mismatch, the pod needs to be scheduled to a node in the same zone as the volume. Use WaitForFirstConsumer binding mode in the StorageClass to prevent this issue.
6. Check the CSI Driver
If using CSI volumes, verify the CSI driver is healthy.
# Check CSI driver pods
kubectl get pods -n kube-system | grep csi
# Check the CSI driver on the target node
kubectl get pods -n kube-system --field-selector spec.nodeName=<node-name> | grep csi
# Check CSI driver logs
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacher --tail=100
# Verify CSINode registration
kubectl get csinode <node-name> -o yaml
If the CSI node plugin is not running on the target node, volume attachment will fail.
7. Force Detach a Stuck Volume
If the volume is stuck on an old node that is no longer available, you may need to force detach it.
# Delete the old VolumeAttachment
kubectl delete volumeattachment <old-attachment-name>
# If that does not work, try force deleting
kubectl delete volumeattachment <old-attachment-name> --grace-period=0 --force
For cloud-specific force detach:
# AWS
aws ec2 detach-volume --volume-id <vol-id> --force
# GCP
gcloud compute instances detach-disk <instance-name> --disk=<disk-name> --zone=<zone>
# Azure
az vm disk detach --resource-group <rg> --vm-name <vm> --name <disk-name>
Warning: Force detaching a volume that is actively being written to can cause data corruption. Only force detach when you are certain the volume is not in active use.
8. Verify Cloud Provider Permissions
Check that the node has permissions to attach volumes.
# Check the CSI controller's service account
kubectl get deployment -n kube-system <csi-controller-deployment> -o jsonpath='{.spec.template.spec.serviceAccountName}'
# For AWS, check the IAM role
# For GCP, check the service account
# For Azure, check the managed identity
# Check CSI controller logs for permission errors
kubectl logs -n kube-system <csi-controller-pod> --tail=100 | grep -i "forbidden\|unauthorized\|access denied"
9. Delete and Reschedule the Pod
After resolving the attachment issue, the pod may need to be deleted for the kubelet to retry.
# Delete the stuck pod
kubectl delete pod <pod-name>
# If the pod is managed by a controller, it will be recreated
# Watch for the new pod
kubectl get pods -w | grep <deployment-name>
10. Verify Volume Is Attached and Pod Starts
Confirm the volume attachment succeeded and the pod is running.
# Check VolumeAttachment status
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PV:.spec.source.persistentVolumeName,NODE:.spec.nodeName,ATTACHED:.status.attached | grep <pv-name>
# Check the pod status
kubectl get pod <pod-name>
# Verify the volume is mounted inside the pod
kubectl exec <pod-name> -- df -h | grep <mount-path>
kubectl exec <pod-name> -- ls -la <mount-path>
The volume is successfully attached when the VolumeAttachment shows attached: true, the pod transitions to Running, and the mount path is accessible from inside the container.
How to Explain This in an Interview
I would explain the volume attachment lifecycle: after the scheduler places a pod on a node, the attach/detach controller (running in kube-controller-manager or as a CSI controller) calls the cloud API to attach the volume to the node. Once attached, the kubelet on the node mounts the volume into the pod. I'd discuss how the VolumeAttachment API resource tracks this state, the difference between in-tree and CSI volume plugins, and the maxVolumesPerNode limit. For multi-attach errors specifically, I'd explain that block volumes are fundamentally single-attach in most cloud providers and discuss the force-detach mechanisms and their data safety implications.
Prevention
- Use WaitForFirstConsumer StorageClass binding mode to co-locate volumes and pods
- Monitor node volume attachment counts against limits
- Ensure proper pod termination and volume detachment during node maintenance
- Configure pod disruption budgets to manage voluntary disruptions
- Use topology-aware scheduling to keep pods in the same zone as their volumes