What Is the Operator Pattern in Kubernetes?

intermediate|operatorsdevopssrebackend developerCKA

TL;DR

The Operator pattern extends Kubernetes by encoding domain-specific operational knowledge into a custom controller. It uses Custom Resource Definitions (CRDs) to define new resource types and a controller that watches them to automate complex application lifecycle management.

Detailed Answer

The Operator pattern is a method of extending Kubernetes to manage complex applications by encoding operational expertise into software. Instead of a human running manual procedures (scale the database, perform a backup, handle failover), an Operator automates these tasks.

Core Concept

An Operator consists of two parts:

Custom Resource Definition (CRD) — defines a new resource type (e.g., PostgresCluster)
Controller — watches CRD instances and reconciles actual state with desired state

User creates:                   Controller sees:              Controller acts:
PostgresCluster                 "Desired: 3 replicas"         Creates 3 StatefulSets
  replicas: 3        →          "Actual: 0 replicas"    →     Configures replication
  backup: daily                 "Backup: not configured"      Schedules CronJob for backups

The Reconciliation Loop

// Simplified reconciliation logic
func (r *PostgresClusterReconciler) Reconcile(ctx context.Context,
    req ctrl.Request) (ctrl.Result, error) {

    // 1. Fetch the custom resource
    cluster := &v1.PostgresCluster{}
    err := r.Get(ctx, req.NamespacedName, cluster)

    // 2. Compare desired vs actual state
    actual := r.getActualState(cluster)

    // 3. Take corrective actions
    if actual.Replicas < cluster.Spec.Replicas {
        r.scaleUp(cluster)
    }
    if !actual.BackupConfigured && cluster.Spec.Backup.Enabled {
        r.configureBackup(cluster)
    }
    if actual.NeedsFailover {
        r.performFailover(cluster)
    }

    // 4. Update status
    cluster.Status.ReadyReplicas = actual.ReadyReplicas
    r.Status().Update(ctx, cluster)

    // 5. Requeue for next reconciliation
    return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}

A Real Example: PostgreSQL Operator

# CRD instance — user creates this
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
  name: production-db
  namespace: production
spec:
  postgresVersion: 16
  instances:
    - name: primary
      replicas: 3
      dataVolumeClaimSpec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: fast-ssd
      resources:
        requests:
          cpu: "2"
          memory: "8Gi"
  backups:
    pgbackrest:
      repos:
        - name: repo1
          schedules:
            full: "0 1 * * 0"
            incremental: "0 1 * * 1-6"
          volume:
            volumeClaimSpec:
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 200Gi

The Operator handles everything:

Creates StatefulSets for primary and replicas
Configures streaming replication
Sets up connection pooling (PgBouncer)
Schedules backups with pgBackRest
Handles automatic failover if the primary fails
Manages TLS certificates
Performs rolling upgrades

Operator Capability Levels

The Operator Framework defines five maturity levels:

| Level | Capabilities | Example | |-------|-------------|---------| | 1 - Basic Install | Automated install, configuration | Helm chart equivalent | | 2 - Seamless Upgrades | Version upgrades without downtime | Rolling PostgreSQL upgrades | | 3 - Full Lifecycle | Backup, restore, failure recovery | Automated pg_basebackup + restore | | 4 - Deep Insights | Metrics, alerts, log processing | Prometheus integration, custom dashboards | | 5 - Auto Pilot | Auto-scaling, auto-tuning, auto-healing | Automatic connection pool resizing |

Popular Operators

| Application | Operator | Level | |------------|----------|-------| | Prometheus | prometheus-operator | 5 | | PostgreSQL | Crunchy / Zalando | 4-5 | | MySQL | Oracle MySQL Operator | 3-4 | | Redis | Redis Operator (OpsTree) | 3-4 | | Elasticsearch | ECK (Elastic Cloud on K8s) | 4 | | Kafka | Strimzi | 4-5 | | Cert-Manager | cert-manager | 5 | | ArgoCD | argocd-operator | 3 |

Operator vs. Helm vs. StatefulSet

| Capability | StatefulSet | Helm Chart | Operator | |-----------|-------------|------------|----------| | Ordered deployment | Yes | Yes | Yes | | Stable network identity | Yes | Yes | Yes | | Automatic backup | No | No (manual CronJob) | Yes | | Failover handling | No | No | Yes | | Version upgrade | Manual | Manual | Automated | | Self-healing | Pod restart only | Pod restart only | Application-aware recovery | | Configuration management | Manual | Values-driven | Continuous reconciliation |

Finding Operators

# OperatorHub.io — the central catalog
# https://operatorhub.io

# With OLM (Operator Lifecycle Manager)
kubectl get packagemanifests

# Artifact Hub
# https://artifacthub.io/packages/search?kind=3

When to Use an Operator

| Scenario | Use Operator? | |----------|--------------| | Stateless web app | No — Deployment is sufficient | | Database cluster with replication | Yes — complex lifecycle management | | Message queue with partitioning | Yes — operational automation | | Simple key-value store | Maybe — depends on operational needs | | Certificate management | Yes — cert-manager is an Operator | | Monitoring stack | Yes — prometheus-operator simplifies management |

Risks and Considerations

Operator quality varies: Community Operators range from production-ready to experimental
CRD sprawl: Each Operator adds CRDs. Track what is installed.
Upgrade complexity: The Operator itself must be upgraded alongside the application
Lock-in: CRDs are Operator-specific. Migrating between Operators is non-trivial.
RBAC requirements: Operators often need broad cluster permissions

Why Interviewers Ask This

Operators are how the Kubernetes ecosystem manages stateful and complex applications. Understanding the pattern shows you can extend Kubernetes beyond its built-in primitives.

Common Follow-Up Questions

How does an Operator differ from a Helm chart?

A Helm chart deploys static manifests. An Operator is an active controller that continuously reconciles the desired state, handles Day 2 operations (backups, scaling, upgrades), and self-heals.

What is the reconciliation loop?

The controller watches for changes to its custom resource, compares the desired state with the actual state, and takes actions to converge them. This loop runs continuously.

Can you give an example of a widely-used Operator?

The Prometheus Operator manages Prometheus instances via ServiceMonitor and Prometheus CRDs. The PostgreSQL Operator (Zalando) manages entire PostgreSQL clusters including replication, failover, and backups.

Key Takeaways

Operators encode operational knowledge as code — automating tasks that would otherwise require manual intervention.
The pattern combines CRDs (new resource types) with controllers (reconciliation logic).
Operators excel at managing stateful applications that need Day 2 operations like backup, upgrade, and failover.