What is the cloud-controller-manager and when is it used?

intermediate|architecturedevopssrecloud architectCKA
TL;DR

The cloud-controller-manager is an optional control plane component that embeds cloud-specific control logic. It separates cloud provider integrations from core Kubernetes code, running controllers for node lifecycle, routes, and load balancers that interact with the underlying cloud provider's API.

Detailed Answer

The cloud-controller-manager (CCM) is a control plane component that runs cloud-specific controller loops. It was introduced to decouple cloud provider integrations from the core Kubernetes codebase, allowing cloud vendors to iterate on their integrations independently of Kubernetes releases.

Why It Exists

Historically, cloud provider logic (creating load balancers, managing node instances, configuring routes) was embedded directly in the kube-controller-manager and the kubelet. This created several problems:

  • Cloud provider code had to follow the Kubernetes release cadence
  • Bugs in one cloud provider's code could affect the entire release
  • The kube-controller-manager binary grew with every cloud provider addition
  • Testing required cloud credentials and infrastructure

The cloud-controller-manager extraction (KEP-2395) moved all cloud-specific logic into a separate binary that cloud providers maintain themselves.

Controllers Within the CCM

Node Controller -- Checks the cloud provider to determine if a node has been deleted in the cloud after it stops responding. It also initializes nodes with cloud-specific metadata like zone labels, instance type, and external IP addresses.

# Observe cloud-specific labels set by the Node controller
kubectl get node worker-1 -o jsonpath='{.metadata.labels}' | python3 -m json.tool

# Typical cloud-set labels:
# topology.kubernetes.io/zone: us-east-1a
# topology.kubernetes.io/region: us-east-1
# node.kubernetes.io/instance-type: m5.xlarge
# kubernetes.io/os: linux
# kubernetes.io/arch: amd64

Route Controller -- Configures routes in the cloud infrastructure so that containers on different nodes can communicate with each other. This is cloud-specific because each provider has different networking APIs.

Service Controller -- Watches for Services of type LoadBalancer and provisions cloud load balancers (AWS ELB/NLB, GCP Load Balancer, Azure Load Balancer) to expose them externally:

apiVersion: v1
kind: Service
metadata:
  name: web-external
  annotations:
    # AWS-specific annotations managed by the cloud-controller-manager
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
    service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
spec:
  type: LoadBalancer
  selector:
    app: web
  ports:
  - port: 443
    targetPort: 8443

How It Integrates

The CCM runs alongside (not replacing) the kube-controller-manager. The kube-controller-manager must be started with --cloud-provider=external to signal that cloud-related logic is handled by the CCM:

# kube-controller-manager flags for external cloud provider
spec:
  containers:
  - command:
    - kube-controller-manager
    - --cloud-provider=external  # Delegates cloud logic to CCM
    - --kubeconfig=/etc/kubernetes/controller-manager.conf

The kubelet also needs to be configured to use the external cloud provider:

# kubelet configuration
--cloud-provider=external

Cloud Provider Implementations

Each cloud provider maintains their own CCM binary:

| Provider | Project | |----------|---------| | AWS | cloud-provider-aws | | GCP | cloud-provider-gcp | | Azure | cloud-provider-azure | | OpenStack | cloud-provider-openstack | | vSphere | cloud-provider-vsphere |

These are typically deployed as a Deployment or DaemonSet in the kube-system namespace:

# Check if a cloud-controller-manager is running
kubectl get pods -n kube-system | grep cloud-controller

# View cloud-controller-manager logs
kubectl logs -n kube-system deployment/aws-cloud-controller-manager

# Common issues to look for in logs:
# - Cloud API authentication failures
# - Rate limiting from the cloud provider
# - Subnet or security group misconfiguration for LoadBalancer Services

Troubleshooting Cloud Integration

When a LoadBalancer Service is stuck with a Pending external IP:

# Check Service events for cloud provisioning errors
kubectl describe svc web-external

# Common events:
# "Error creating load balancer: AccessDenied"
# "Error creating load balancer: SubnetNotFound"
# "Ensuring load balancer" -> "Ensured load balancer" (success)

# Check the CCM logs for detailed error messages
kubectl logs -n kube-system -l component=cloud-controller-manager --tail=100

# Verify cloud credentials are available to the CCM
kubectl get secret -n kube-system | grep cloud

Bare Metal and On-Premise

On bare metal clusters, the cloud-controller-manager is not used. For LoadBalancer functionality, projects like MetalLB provide an implementation that works without a cloud provider by using BGP or Layer 2 protocols to advertise Service IPs. For node metadata and lifecycle management, tools like Cluster API can provide similar abstractions for on-premise infrastructure.

Why Interviewers Ask This

Interviewers ask this to evaluate whether a candidate understands how Kubernetes integrates with cloud platforms. It tests knowledge of the architectural separation between cloud-agnostic Kubernetes core and cloud-specific functionality, which matters for multi-cloud strategies and troubleshooting cloud resource provisioning.

Common Follow-Up Questions

What controllers does the cloud-controller-manager run?
It typically runs a Node controller (detects when cloud VMs are terminated), a Route controller (configures cloud network routes), and a Service controller (manages cloud load balancers for LoadBalancer-type Services).
Do you need the cloud-controller-manager on bare metal?
No. On bare metal or on-premise clusters, there is no cloud API to integrate with. However, projects like MetalLB provide LoadBalancer functionality without a cloud provider.
How was cloud logic handled before the cloud-controller-manager?
Cloud-specific code was embedded directly in the kube-controller-manager and kubelet. The extraction into a separate binary allows cloud providers to develop and release their integrations independently of the Kubernetes release cycle.

Key Takeaways

  • It decouples cloud-specific logic from the core Kubernetes codebase
  • Cloud providers maintain their own controller manager binaries (aws-cloud-controller-manager, gce-cloud-controller-manager, etc.)
  • It manages node lifecycle, cloud routes, and LoadBalancer Services in cloud environments