How Does Prometheus Monitor Kubernetes?

Q: How Does Prometheus Monitor Kubernetes?

Prometheus monitors Kubernetes by scraping metrics endpoints from Pods, nodes, and cluster components. It uses Kubernetes service discovery to automatically find targets. The kube-prometheus-stack (Prometheus Operator) is the standard deployment method, providing pre-built dashboards and alerting rules.

Detailed Answer

How Prometheus Works

Prometheus is a pull-based monitoring system. It periodically scrapes HTTP endpoints (typically /metrics) on targets, parses the exposed metrics, stores them in a time-series database, and evaluates alerting rules.

The four main metric types:

Counter: Monotonically increasing value (e.g., total HTTP requests)
Gauge: Value that can go up or down (e.g., current memory usage)
Histogram: Distribution of values in buckets (e.g., request latency)
Summary: Similar to histogram but calculates quantiles client-side

Deploying Prometheus with kube-prometheus-stack

The recommended way to deploy Prometheus on Kubernetes is through the kube-prometheus-stack Helm chart, which includes Prometheus, Grafana, Alertmanager, and node-exporter:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set prometheus.prometheusSpec.retention=30d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi \
  --set grafana.adminPassword=securePassword

This deploys:

Prometheus - Metrics collection and storage
Alertmanager - Alert routing and notification
Grafana - Dashboards and visualization
node-exporter - Host-level metrics (CPU, memory, disk)
kube-state-metrics - Kubernetes object state metrics

# Verify the deployment
kubectl get pods -n monitoring
kubectl get svc -n monitoring

Kubernetes Metrics Sources

| Source | Metrics | Endpoint | |---|---|---| | kube-apiserver | API request latency, counts | /metrics on :6443 | | kubelet | Container CPU, memory, network | /metrics on :10250 | | cAdvisor (in kubelet) | Container resource usage | /metrics/cadvisor | | kube-state-metrics | Object state (Pod phase, replicas) | /metrics on :8080 | | node-exporter | Node CPU, memory, disk, network | /metrics on :9100 | | CoreDNS | DNS query latency, cache stats | /metrics on :9153 | | etcd | Cluster health, disk IO | /metrics on :2379 |

ServiceMonitor CRD

The Prometheus Operator uses ServiceMonitor CRDs to define scrape targets declaratively:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: my-app-metrics
  namespace: monitoring
  labels:
    release: monitoring  # Must match Prometheus Operator's serviceMonitorSelector
spec:
  namespaceSelector:
    matchNames:
      - production
  selector:
    matchLabels:
      app: my-app
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

This tells Prometheus to scrape every Service labeled app: my-app in the production namespace on the port named metrics.

Application Instrumentation

Expose custom metrics from your application:

# Application Deployment with metrics port
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
  namespace: production
spec:
  replicas: 3
  selector:
    matchLabels:
      app: my-app
  template:
    metadata:
      labels:
        app: my-app
    spec:
      containers:
        - name: app
          image: my-app:v2
          ports:
            - name: http
              containerPort: 8080
            - name: metrics
              containerPort: 9090
---
apiVersion: v1
kind: Service
metadata:
  name: my-app
  namespace: production
  labels:
    app: my-app
spec:
  selector:
    app: my-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
    - name: metrics
      port: 9090
      targetPort: 9090

Key PromQL Queries for Kubernetes

# CPU usage per Pod
rate(container_cpu_usage_seconds_total{namespace="production"}[5m])

# Memory usage per Pod
container_memory_working_set_bytes{namespace="production"}

# Pod restart count
kube_pod_container_status_restarts_total{namespace="production"}

# Pod not ready for more than 5 minutes
kube_pod_status_ready{condition="false"} == 1
  and on(pod) (time() - kube_pod_created > 300)

# Node CPU utilization percentage
100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

# Persistent Volume usage
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100

# API server request rate
rate(apiserver_request_total[5m])

# API server error rate
rate(apiserver_request_total{code=~"5.."}[5m]) / rate(apiserver_request_total[5m]) * 100

Alerting Rules

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: kubernetes-alerts
  namespace: monitoring
  labels:
    release: monitoring
spec:
  groups:
    - name: kubernetes-pod-alerts
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} is crash looping"
            description: "Pod has restarted {{ $value }} times in the last 15 minutes."

        - alert: PodNotReady
          expr: kube_pod_status_phase{phase=~"Pending|Unknown"} > 0
          for: 15m
          labels:
            severity: critical
          annotations:
            summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} has been not ready for 15m"

        - alert: HighMemoryUsage
          expr: |
            container_memory_working_set_bytes{container!=""}
            / on(namespace, pod, container) kube_pod_container_resource_limits{resource="memory"}
            > 0.9
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Container {{ $labels.container }} memory usage above 90%"

Accessing Dashboards

# Port-forward to Grafana
kubectl port-forward -n monitoring svc/monitoring-grafana 3000:80

# Port-forward to Prometheus UI
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-prometheus 9090:9090

# Port-forward to Alertmanager
kubectl port-forward -n monitoring svc/monitoring-kube-prometheus-alertmanager 9093:9093

Production Considerations

For production clusters, ensure Prometheus has sufficient storage and retention configured. Use remote write (Thanos, Cortex, or Mimir) for long-term storage and multi-cluster aggregation. Set resource requests and limits on Prometheus Pods to prevent OOM kills. Use recording rules to pre-compute expensive queries that power dashboards.