Kubernetes FailedAttachVolume

Causes and Fixes

FailedAttachVolume is a warning event indicating that a volume could not be attached to the node where a pod is scheduled. This is common with cloud block storage (EBS, PD, Azure Disk) and occurs when the volume attachment operation fails at the infrastructure level, preventing the pod from starting.

Symptoms

  • Pod events show 'FailedAttachVolume' warning
  • Pod stuck in ContainerCreating state
  • Events show 'AttachVolume.Attach failed' with a provider-specific error
  • VolumeAttachment objects show detach/attach errors
  • Pod cannot start because the volume is not available on the node

Common Causes

1
Volume already attached to another node
Block storage volumes (EBS, PD, Azure Disk) can only be attached to one node at a time. If the volume is still attached to a previous node (due to ungraceful termination or stuck detach), the new attach fails.
2
Node volume attachment limit reached
Each node has a maximum number of volumes that can be attached (e.g., 25-40 for AWS EC2 instances depending on type). Exceeding this limit prevents additional attachments.
3
Volume is in a different availability zone
Cloud block storage is zone-scoped. If the pod is scheduled to a node in zone-a but the volume was provisioned in zone-b, the attach will fail.
4
Cloud API permissions issue
The node's IAM role or service account lacks permissions to attach volumes, causing API calls to the cloud provider to fail.
5
Volume does not exist
The underlying storage volume was deleted externally (in the cloud console or by another process) but the PV still references it.
6
CSI driver not functioning
The CSI driver responsible for volume attachment is not running on the node, has crashed, or cannot communicate with the cloud API.

Step-by-Step Troubleshooting

FailedAttachVolume errors prevent pods from starting because the required storage volume cannot be attached to the node. This guide covers diagnosis across cloud providers and resolution strategies.

1. Check Pod Events for the Specific Error

Start by examining the pod's events for the detailed error message.

kubectl describe pod <pod-name>

Look at the Events section for entries like:

  • FailedAttachVolume: Multi-Attach error for volume "pvc-xxxxx"
  • AttachVolume.Attach failed: ... volume is already attached to an instance
  • AttachVolume.Attach failed: ... Maximum number of volumes reached

The error message from the cloud provider is embedded in the event and is the most important diagnostic clue.

2. Check the VolumeAttachment Resource

Kubernetes tracks volume attachments as API resources.

# List all VolumeAttachments
kubectl get volumeattachment

# Find attachments for the specific PV
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PV:.spec.source.persistentVolumeName,NODE:.spec.nodeName,ATTACHED:.status.attached

If a VolumeAttachment shows the volume attached to a different node than where the pod is scheduled, or if it shows attached: false with errors, you have found the problem.

# Get detailed attachment status
kubectl describe volumeattachment <attachment-name>

3. Check Where the Volume Is Currently Attached

If the volume is attached to another node, identify why it has not been detached.

# Check the PV to find the volume ID
kubectl get pv <pv-name> -o jsonpath='{.spec.csi.volumeHandle}'
# or for in-tree plugins
kubectl get pv <pv-name> -o jsonpath='{.spec.awsElasticBlockStore.volumeID}'

# Check which node has the volume
kubectl get volumeattachment -o json | jq '.items[] | select(.spec.source.persistentVolumeName=="<pv-name>") | {node: .spec.nodeName, attached: .status.attached}'

The volume may be stuck on a node that is NotReady, terminated, or had a pod that terminated ungracefully.

# Check the node status
kubectl get node <attached-node>

4. Check Node Volume Limits

Nodes have maximum volume attachment limits.

# Check how many volumes are attached to the target node
kubectl get csinode <node-name> -o yaml

# Check node allocatable for volume count
kubectl get node <node-name> -o jsonpath='{.status.allocatable}' | jq .

# Count current volume attachments for the node
kubectl get volumeattachment -o json | jq '[.items[] | select(.spec.nodeName=="<node-name>")] | length'

AWS EC2 volume limits vary by instance type (typically 25-39 EBS volumes per instance). If you are at the limit, you need to move pods to other nodes or use instance types with higher limits.

5. Check Zone Compatibility

Verify the volume and node are in the same availability zone.

# Check the volume's zone (from the PV)
kubectl get pv <pv-name> -o jsonpath='{.spec.nodeAffinity}'

# Check the node's zone
kubectl get node <node-name> -L topology.kubernetes.io/zone

# Check if the PV has topology constraints
kubectl get pv <pv-name> -o yaml | grep -A10 nodeAffinity

If there is a zone mismatch, the pod needs to be scheduled to a node in the same zone as the volume. Use WaitForFirstConsumer binding mode in the StorageClass to prevent this issue.

6. Check the CSI Driver

If using CSI volumes, verify the CSI driver is healthy.

# Check CSI driver pods
kubectl get pods -n kube-system | grep csi

# Check the CSI driver on the target node
kubectl get pods -n kube-system --field-selector spec.nodeName=<node-name> | grep csi

# Check CSI driver logs
kubectl logs -n kube-system <csi-controller-pod> -c csi-attacher --tail=100

# Verify CSINode registration
kubectl get csinode <node-name> -o yaml

If the CSI node plugin is not running on the target node, volume attachment will fail.

7. Force Detach a Stuck Volume

If the volume is stuck on an old node that is no longer available, you may need to force detach it.

# Delete the old VolumeAttachment
kubectl delete volumeattachment <old-attachment-name>

# If that does not work, try force deleting
kubectl delete volumeattachment <old-attachment-name> --grace-period=0 --force

For cloud-specific force detach:

# AWS
aws ec2 detach-volume --volume-id <vol-id> --force

# GCP
gcloud compute instances detach-disk <instance-name> --disk=<disk-name> --zone=<zone>

# Azure
az vm disk detach --resource-group <rg> --vm-name <vm> --name <disk-name>

Warning: Force detaching a volume that is actively being written to can cause data corruption. Only force detach when you are certain the volume is not in active use.

8. Verify Cloud Provider Permissions

Check that the node has permissions to attach volumes.

# Check the CSI controller's service account
kubectl get deployment -n kube-system <csi-controller-deployment> -o jsonpath='{.spec.template.spec.serviceAccountName}'

# For AWS, check the IAM role
# For GCP, check the service account
# For Azure, check the managed identity

# Check CSI controller logs for permission errors
kubectl logs -n kube-system <csi-controller-pod> --tail=100 | grep -i "forbidden\|unauthorized\|access denied"

9. Delete and Reschedule the Pod

After resolving the attachment issue, the pod may need to be deleted for the kubelet to retry.

# Delete the stuck pod
kubectl delete pod <pod-name>

# If the pod is managed by a controller, it will be recreated
# Watch for the new pod
kubectl get pods -w | grep <deployment-name>

10. Verify Volume Is Attached and Pod Starts

Confirm the volume attachment succeeded and the pod is running.

# Check VolumeAttachment status
kubectl get volumeattachment -o custom-columns=NAME:.metadata.name,PV:.spec.source.persistentVolumeName,NODE:.spec.nodeName,ATTACHED:.status.attached | grep <pv-name>

# Check the pod status
kubectl get pod <pod-name>

# Verify the volume is mounted inside the pod
kubectl exec <pod-name> -- df -h | grep <mount-path>
kubectl exec <pod-name> -- ls -la <mount-path>

The volume is successfully attached when the VolumeAttachment shows attached: true, the pod transitions to Running, and the mount path is accessible from inside the container.

How to Explain This in an Interview

I would explain the volume attachment lifecycle: after the scheduler places a pod on a node, the attach/detach controller (running in kube-controller-manager or as a CSI controller) calls the cloud API to attach the volume to the node. Once attached, the kubelet on the node mounts the volume into the pod. I'd discuss how the VolumeAttachment API resource tracks this state, the difference between in-tree and CSI volume plugins, and the maxVolumesPerNode limit. For multi-attach errors specifically, I'd explain that block volumes are fundamentally single-attach in most cloud providers and discuss the force-detach mechanisms and their data safety implications.

Prevention

  • Use WaitForFirstConsumer StorageClass binding mode to co-locate volumes and pods
  • Monitor node volume attachment counts against limits
  • Ensure proper pod termination and volume detachment during node maintenance
  • Configure pod disruption budgets to manage voluntary disruptions
  • Use topology-aware scheduling to keep pods in the same zone as their volumes

Related Errors