What Is the Operator Pattern in Kubernetes?
The Operator pattern extends Kubernetes by encoding domain-specific operational knowledge into a custom controller. It uses Custom Resource Definitions (CRDs) to define new resource types and a controller that watches them to automate complex application lifecycle management.
Detailed Answer
The Operator pattern is a method of extending Kubernetes to manage complex applications by encoding operational expertise into software. Instead of a human running manual procedures (scale the database, perform a backup, handle failover), an Operator automates these tasks.
Core Concept
An Operator consists of two parts:
- Custom Resource Definition (CRD) — defines a new resource type (e.g.,
PostgresCluster) - Controller — watches CRD instances and reconciles actual state with desired state
User creates: Controller sees: Controller acts:
PostgresCluster "Desired: 3 replicas" Creates 3 StatefulSets
replicas: 3 → "Actual: 0 replicas" → Configures replication
backup: daily "Backup: not configured" Schedules CronJob for backups
The Reconciliation Loop
// Simplified reconciliation logic
func (r *PostgresClusterReconciler) Reconcile(ctx context.Context,
req ctrl.Request) (ctrl.Result, error) {
// 1. Fetch the custom resource
cluster := &v1.PostgresCluster{}
err := r.Get(ctx, req.NamespacedName, cluster)
// 2. Compare desired vs actual state
actual := r.getActualState(cluster)
// 3. Take corrective actions
if actual.Replicas < cluster.Spec.Replicas {
r.scaleUp(cluster)
}
if !actual.BackupConfigured && cluster.Spec.Backup.Enabled {
r.configureBackup(cluster)
}
if actual.NeedsFailover {
r.performFailover(cluster)
}
// 4. Update status
cluster.Status.ReadyReplicas = actual.ReadyReplicas
r.Status().Update(ctx, cluster)
// 5. Requeue for next reconciliation
return ctrl.Result{RequeueAfter: 30 * time.Second}, nil
}
A Real Example: PostgreSQL Operator
# CRD instance — user creates this
apiVersion: postgres-operator.crunchydata.com/v1beta1
kind: PostgresCluster
metadata:
name: production-db
namespace: production
spec:
postgresVersion: 16
instances:
- name: primary
replicas: 3
dataVolumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: fast-ssd
resources:
requests:
cpu: "2"
memory: "8Gi"
backups:
pgbackrest:
repos:
- name: repo1
schedules:
full: "0 1 * * 0"
incremental: "0 1 * * 1-6"
volume:
volumeClaimSpec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 200Gi
The Operator handles everything:
- Creates StatefulSets for primary and replicas
- Configures streaming replication
- Sets up connection pooling (PgBouncer)
- Schedules backups with pgBackRest
- Handles automatic failover if the primary fails
- Manages TLS certificates
- Performs rolling upgrades
Operator Capability Levels
The Operator Framework defines five maturity levels:
| Level | Capabilities | Example | |-------|-------------|---------| | 1 - Basic Install | Automated install, configuration | Helm chart equivalent | | 2 - Seamless Upgrades | Version upgrades without downtime | Rolling PostgreSQL upgrades | | 3 - Full Lifecycle | Backup, restore, failure recovery | Automated pg_basebackup + restore | | 4 - Deep Insights | Metrics, alerts, log processing | Prometheus integration, custom dashboards | | 5 - Auto Pilot | Auto-scaling, auto-tuning, auto-healing | Automatic connection pool resizing |
Popular Operators
| Application | Operator | Level | |------------|----------|-------| | Prometheus | prometheus-operator | 5 | | PostgreSQL | Crunchy / Zalando | 4-5 | | MySQL | Oracle MySQL Operator | 3-4 | | Redis | Redis Operator (OpsTree) | 3-4 | | Elasticsearch | ECK (Elastic Cloud on K8s) | 4 | | Kafka | Strimzi | 4-5 | | Cert-Manager | cert-manager | 5 | | ArgoCD | argocd-operator | 3 |
Operator vs. Helm vs. StatefulSet
| Capability | StatefulSet | Helm Chart | Operator | |-----------|-------------|------------|----------| | Ordered deployment | Yes | Yes | Yes | | Stable network identity | Yes | Yes | Yes | | Automatic backup | No | No (manual CronJob) | Yes | | Failover handling | No | No | Yes | | Version upgrade | Manual | Manual | Automated | | Self-healing | Pod restart only | Pod restart only | Application-aware recovery | | Configuration management | Manual | Values-driven | Continuous reconciliation |
Finding Operators
# OperatorHub.io — the central catalog
# https://operatorhub.io
# With OLM (Operator Lifecycle Manager)
kubectl get packagemanifests
# Artifact Hub
# https://artifacthub.io/packages/search?kind=3
When to Use an Operator
| Scenario | Use Operator? | |----------|--------------| | Stateless web app | No — Deployment is sufficient | | Database cluster with replication | Yes — complex lifecycle management | | Message queue with partitioning | Yes — operational automation | | Simple key-value store | Maybe — depends on operational needs | | Certificate management | Yes — cert-manager is an Operator | | Monitoring stack | Yes — prometheus-operator simplifies management |
Risks and Considerations
- Operator quality varies: Community Operators range from production-ready to experimental
- CRD sprawl: Each Operator adds CRDs. Track what is installed.
- Upgrade complexity: The Operator itself must be upgraded alongside the application
- Lock-in: CRDs are Operator-specific. Migrating between Operators is non-trivial.
- RBAC requirements: Operators often need broad cluster permissions
Why Interviewers Ask This
Operators are how the Kubernetes ecosystem manages stateful and complex applications. Understanding the pattern shows you can extend Kubernetes beyond its built-in primitives.
Common Follow-Up Questions
Key Takeaways
- Operators encode operational knowledge as code — automating tasks that would otherwise require manual intervention.
- The pattern combines CRDs (new resource types) with controllers (reconciliation logic).
- Operators excel at managing stateful applications that need Day 2 operations like backup, upgrade, and failover.