What is the kube-controller-manager and what controllers does it run?

intermediate|architecturedevopssrecloud architectCKA

TL;DR

The kube-controller-manager is a single binary that runs multiple controller loops, each responsible for reconciling a specific aspect of cluster state. Controllers watch the API server for changes and take action to move actual state toward desired state, implementing the core declarative model of Kubernetes.

Detailed Answer

The kube-controller-manager is a control plane component that packages many distinct controllers into a single process. Each controller is a control loop that watches the shared state of the cluster through the API server and makes changes to move the current state toward the desired state. This reconciliation pattern is the engine behind Kubernetes's declarative model.

Core Controllers

The kube-controller-manager runs dozens of controllers. Here are the most important ones:

Deployment Controller -- Manages Deployments by creating, updating, and deleting ReplicaSets. It handles rolling updates, rollbacks, and scaling by adjusting the ReplicaSet replica counts.

ReplicaSet Controller -- Ensures the specified number of pod replicas are running at all times. If a pod dies, it creates a new one. If there are too many, it terminates the excess.

Node Controller -- Monitors node health by checking heartbeats. If a node stops reporting, the controller taints it and eventually evicts pods so they can be rescheduled elsewhere.

Job Controller -- Manages Job objects, ensuring the specified number of completions are achieved by creating pods and tracking their success or failure.

EndpointSlice Controller -- Maintains EndpointSlice objects that map Services to the pods that back them, updating when pods are created, deleted, or change readiness.

Service Account Controller -- Creates default ServiceAccounts in new namespaces and ensures the associated token secrets exist.

Namespace Controller -- Handles namespace deletion by cleaning up all resources within the namespace when it is deleted.

Garbage Collection Controller -- Deletes objects whose owner references point to objects that no longer exist (cascading deletion).

The Reconciliation Loop

Every controller follows the same pattern:

1. Watch: Subscribe to API server events for relevant objects
2. Observe: Get the current state of the object
3. Compare: Determine the difference between current and desired state
4. Act: Take the minimum action needed to converge toward desired state
5. Repeat

For example, the ReplicaSet controller:

Watch: ReplicaSet and Pod events
Observe: ReplicaSet desires 3 replicas, currently 2 pods exist
Compare: Need 1 more pod
Act: Create 1 new pod via the API server

Observing Controllers in Action

# Watch a Deployment rollout to see the controller in action
kubectl create deployment nginx --image=nginx:1.26 --replicas=3
kubectl rollout status deployment/nginx

# Scale the deployment and watch the ReplicaSet controller respond
kubectl scale deployment nginx --replicas=5
kubectl get replicasets -w

# Delete a pod and watch the controller recreate it
kubectl delete pod $(kubectl get pods -l app=nginx -o name | head -1)
kubectl get pods -l app=nginx -w

# Check controller manager logs
kubectl logs -n kube-system kube-controller-manager-controlplane

# View leader election lease
kubectl get lease -n kube-system kube-controller-manager -o yaml

Controller Manager Configuration

The controller manager is configured via command-line flags in its static pod manifest:

# /etc/kubernetes/manifests/kube-controller-manager.yaml (excerpt)
spec:
  containers:
  - command:
    - kube-controller-manager
    - --kubeconfig=/etc/kubernetes/controller-manager.conf
    - --bind-address=127.0.0.1
    - --leader-elect=true
    - --controllers=*,bootstrapsigner,tokencleaner
    - --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
    - --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
    - --root-ca-file=/etc/kubernetes/pki/ca.crt
    - --service-account-private-key-file=/etc/kubernetes/pki/sa.key
    - --use-service-account-credentials=true
    - --node-monitor-period=5s
    - --node-monitor-grace-period=40s
    - --pod-eviction-timeout=5m0s

Key flags to understand:

--node-monitor-grace-period -- How long to wait before marking a node as unhealthy (default 40s)
--pod-eviction-timeout -- How long to wait before evicting pods from an unhealthy node (default 5m)
--controllers -- Which controllers to enable/disable
--concurrent-deployment-syncs -- Number of concurrent Deployment reconciliations (default 5)

Node Lifecycle and Eviction

The Node controller is particularly important for cluster reliability:

Node stops sending heartbeats
  -> After node-monitor-grace-period (40s): Node marked as "Unknown"
  -> Node is tainted with NoSchedule
  -> After pod-eviction-timeout (5m): Pods are evicted
  -> Evicted pods with controllers (Deployments, etc.) are rescheduled elsewhere
  -> Standalone pods are lost

You can observe this behavior:

# Check node conditions
kubectl get nodes -o wide
kubectl describe node worker-1 | grep -A5 Conditions

# View taints on nodes
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Custom Controllers

The controller pattern is not limited to built-in controllers. The Kubernetes ecosystem heavily uses custom controllers (often called "operators") that follow the same reconciliation loop pattern to manage application-specific resources through Custom Resource Definitions (CRDs).

Why Interviewers Ask This

This question evaluates a candidate's understanding of the reconciliation loop pattern that is fundamental to Kubernetes. It reveals whether they can reason about how self-healing works, how Deployments roll out, and how the system recovers from failures automatically.

Common Follow-Up Questions

What happens if the controller manager crashes?

Reconciliation stops until it restarts. No new replicas are created, failed pods are not replaced, and endpoints are not updated. Existing pods continue running but the system stops self-healing.

How does leader election work for the controller manager?

When running multiple replicas for HA, only one is active. It acquires a lease object in the kube-system namespace. If the lease expires, another instance becomes the leader.

Can you disable specific controllers?

Yes, using the --controllers flag with a minus prefix. For example, --controllers=*,-bootstrapsigner disables the bootstrapsigner controller while enabling all others.

Key Takeaways

Each controller implements a reconciliation loop: observe current state, compare to desired state, take corrective action
The controller manager bundles dozens of controllers into a single process for operational simplicity
Leader election ensures exactly one active controller manager in multi-master setups

Back to Architecture Interview Questions