How Do You Build a Kubernetes Operator?

advanced|operatorssreplatform engineerCKA
TL;DR

Building a Kubernetes Operator involves defining a CRD for your custom resource, implementing a controller with reconciliation logic, and deploying both to the cluster. Frameworks like Kubebuilder and Operator SDK scaffold the boilerplate so you can focus on business logic.

Detailed Answer

Building a Kubernetes Operator involves four main steps: scaffolding the project, defining the API (CRD), implementing the controller logic, and deploying it to the cluster.

Step 1: Scaffold with Kubebuilder

# Initialize the project
kubebuilder init --domain example.com --repo github.com/myorg/my-operator

# Create a new API (CRD + Controller)
kubebuilder create api --group app --version v1 --kind WebApp

# Generated project structure:
# my-operator/
# ├── api/v1/
# │   ├── webapp_types.go       # CRD type definitions
# │   └── zz_generated.deepcopy.go
# ├── internal/controller/
# │   └── webapp_controller.go  # Reconciliation logic
# ├── config/
# │   ├── crd/                  # Generated CRD YAML
# │   ├── rbac/                 # RBAC manifests
# │   └── manager/              # Controller Deployment
# ├── cmd/main.go               # Entrypoint
# ├── Dockerfile
# └── Makefile

Step 2: Define the API (CRD Types)

// api/v1/webapp_types.go
package v1

import metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"

// WebAppSpec defines the desired state
type WebAppSpec struct {
    // Image is the container image to deploy
    Image string `json:"image"`

    // Replicas is the number of desired Pods
    // +kubebuilder:validation:Minimum=1
    // +kubebuilder:validation:Maximum=100
    Replicas int32 `json:"replicas"`

    // Port is the container port
    Port int32 `json:"port,omitempty"`

    // Ingress configuration
    Ingress *IngressSpec `json:"ingress,omitempty"`
}

type IngressSpec struct {
    Enabled bool   `json:"enabled"`
    Host    string `json:"host"`
}

// WebAppStatus defines the observed state
type WebAppStatus struct {
    // ReadyReplicas is the number of Pods that are ready
    ReadyReplicas int32 `json:"readyReplicas"`

    // URL is the public URL if Ingress is enabled
    URL string `json:"url,omitempty"`

    // Conditions represent the latest observations
    Conditions []metav1.Condition `json:"conditions,omitempty"`
}

// +kubebuilder:object:root=true
// +kubebuilder:subresource:status
// +kubebuilder:printcolumn:name="Replicas",type=integer,JSONPath=`.spec.replicas`
// +kubebuilder:printcolumn:name="Ready",type=integer,JSONPath=`.status.readyReplicas`
// +kubebuilder:printcolumn:name="URL",type=string,JSONPath=`.status.url`
type WebApp struct {
    metav1.TypeMeta   `json:",inline"`
    metav1.ObjectMeta `json:"metadata,omitempty"`
    Spec              WebAppSpec   `json:"spec,omitempty"`
    Status            WebAppStatus `json:"status,omitempty"`
}

Step 3: Implement the Controller

// internal/controller/webapp_controller.go
package controller

import (
    "context"
    appsv1 "k8s.io/api/apps/v1"
    corev1 "k8s.io/api/core/v1"
    "k8s.io/apimachinery/pkg/api/errors"
    metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
    ctrl "sigs.k8s.io/controller-runtime"
    "sigs.k8s.io/controller-runtime/pkg/client"
    appv1 "github.com/myorg/my-operator/api/v1"
)

type WebAppReconciler struct {
    client.Client
}

func (r *WebAppReconciler) Reconcile(ctx context.Context,
    req ctrl.Request) (ctrl.Result, error) {

    log := ctrl.LoggerFrom(ctx)

    // 1. Fetch the WebApp custom resource
    webapp := &appv1.WebApp{}
    if err := r.Get(ctx, req.NamespacedName, webapp); err != nil {
        if errors.IsNotFound(err) {
            return ctrl.Result{}, nil  // Resource deleted
        }
        return ctrl.Result{}, err
    }

    // 2. Ensure Deployment exists and matches spec
    deployment := &appsv1.Deployment{}
    err := r.Get(ctx, req.NamespacedName, deployment)

    if errors.IsNotFound(err) {
        // Create the Deployment
        dep := r.deploymentForWebApp(webapp)
        log.Info("Creating Deployment", "name", dep.Name)
        if err := r.Create(ctx, dep); err != nil {
            return ctrl.Result{}, err
        }
        return ctrl.Result{Requeue: true}, nil
    }

    // 3. Update if spec changed
    if *deployment.Spec.Replicas != webapp.Spec.Replicas {
        deployment.Spec.Replicas = &webapp.Spec.Replicas
        if err := r.Update(ctx, deployment); err != nil {
            return ctrl.Result{}, err
        }
    }

    // 4. Ensure Service exists
    // ... similar create/update logic

    // 5. Update status
    webapp.Status.ReadyReplicas = deployment.Status.ReadyReplicas
    if err := r.Status().Update(ctx, webapp); err != nil {
        return ctrl.Result{}, err
    }

    return ctrl.Result{}, nil
}

func (r *WebAppReconciler) deploymentForWebApp(
    app *appv1.WebApp) *appsv1.Deployment {

    labels := map[string]string{"app": app.Name}
    return &appsv1.Deployment{
        ObjectMeta: metav1.ObjectMeta{
            Name:      app.Name,
            Namespace: app.Namespace,
            OwnerReferences: []metav1.OwnerReference{
                *metav1.NewControllerRef(app,
                    appv1.GroupVersion.WithKind("WebApp")),
            },
        },
        Spec: appsv1.DeploymentSpec{
            Replicas: &app.Spec.Replicas,
            Selector: &metav1.LabelSelector{MatchLabels: labels},
            Template: corev1.PodTemplateSpec{
                ObjectMeta: metav1.ObjectMeta{Labels: labels},
                Spec: corev1.PodSpec{
                    Containers: []corev1.Container{{
                        Name:  "app",
                        Image: app.Spec.Image,
                        Ports: []corev1.ContainerPort{{
                            ContainerPort: app.Spec.Port,
                        }},
                    }},
                },
            },
        },
    }
}

func (r *WebAppReconciler) SetupWithManager(
    mgr ctrl.Manager) error {
    return ctrl.NewControllerManagedBy(mgr).
        For(&appv1.WebApp{}).
        Owns(&appsv1.Deployment{}).
        Owns(&corev1.Service{}).
        Complete(r)
}

Step 4: Build and Deploy

# Generate CRD manifests
make manifests

# Build the Docker image
make docker-build IMG=myorg/my-operator:v1.0

# Push to registry
make docker-push IMG=myorg/my-operator:v1.0

# Install CRDs
make install

# Deploy the controller
make deploy IMG=myorg/my-operator:v1.0

Using the Operator

apiVersion: app.example.com/v1
kind: WebApp
metadata:
  name: my-website
  namespace: production
spec:
  image: nginx:1.27
  replicas: 3
  port: 80
  ingress:
    enabled: true
    host: www.example.com
kubectl apply -f webapp.yaml
kubectl get webapps
# NAME         REPLICAS   READY   URL
# my-website   3          3       www.example.com

Testing

// Unit test with envtest
var _ = Describe("WebApp Controller", func() {
    It("should create a Deployment", func() {
        webapp := &appv1.WebApp{
            ObjectMeta: metav1.ObjectMeta{
                Name:      "test-app",
                Namespace: "default",
            },
            Spec: appv1.WebAppSpec{
                Image:    "nginx:1.27",
                Replicas: 2,
                Port:     80,
            },
        }
        Expect(k8sClient.Create(ctx, webapp)).To(Succeed())

        // Eventually the Deployment should be created
        deployment := &appsv1.Deployment{}
        Eventually(func() error {
            return k8sClient.Get(ctx,
                types.NamespacedName{Name: "test-app", Namespace: "default"},
                deployment)
        }, timeout).Should(Succeed())

        Expect(*deployment.Spec.Replicas).To(Equal(int32(2)))
    })
})

Best Practices

  1. Set OwnerReferences on all child resources for automatic garbage collection
  2. Use the status subresource to report observed state
  3. Make reconciliation idempotent — it may be called multiple times
  4. Handle finalizers for cleanup of external resources
  5. Use server-side apply instead of create/update for conflict resolution
  6. Add RBAC markers in code comments for automatic RBAC generation
  7. Implement proper error handling — requeue with backoff on transient errors

Why Interviewers Ask This

Building Operators is a core platform engineering skill. This question tests your ability to extend Kubernetes for domain-specific automation beyond built-in resources.

Common Follow-Up Questions

What language are most Operators written in?
Go is the most common language due to first-class support in client-go, Kubebuilder, and Operator SDK. Operators can also be written in Python, Java, Rust, or even shell scripts.
What is the reconcile loop supposed to return?
It returns a Result (with optional requeue duration) and an error. Return Requeue: true to retry, RequeueAfter: duration for periodic reconciliation, or empty Result when reconciliation is complete.
How do you test an Operator?
Use envtest (from controller-runtime) for unit testing against a real API server. Use kind or k3d for integration tests. Use e2e frameworks like Ginkgo for end-to-end testing.

Key Takeaways

  • Use Kubebuilder or Operator SDK to scaffold the project structure and CRD boilerplate.
  • The reconcile loop is the core logic — it compares desired state with actual state and takes corrective actions.
  • Operators should be idempotent, handle errors gracefully, and update status subresources.

Related Questions

You Might Also Like