Prometheus — K8s SRE Reference

TL;DR

Prometheus scrapes metrics on an interval and stores time series. On K8s, kube-prometheus-stack uses ServiceMonitor/PodMonitor CRs for discovery. Use kubectl top for quick checks, PromQL for trends, and PrometheusRule + Alertmanager for paging.

Metrics vs Logs vs Events

Signal	Best for	Tool
Metrics	Trends, saturation, SLOs, alerting	Prometheus, Grafana
Logs	App errors, stack traces, request details	kubectl logs, Loki, CloudWatch
Events	K8s decisions — scheduling, probes, OOM	`kubectl get events`
Traces	Latency across services	Jaeger, Tempo, Zipkin

Quick Checks

bash resource-usage.sh

# Requires metrics-server.
kubectl top nodes
kubectl top pods -A --sort-by=cpu
kubectl top pods -A --sort-by=memory

kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
kubectl get pods -A --field-selector=status.phase!=Running

kube-prometheus-stack

Most client clusters use the prometheus-community Helm chart (kube-prometheus-stack). It installs Prometheus Operator, Alertmanager, Grafana, node-exporter, and kube-state-metrics.

Prometheus Operator discovers targets via ServiceMonitor CRs — not static scrape configs.

bash stack-checks.sh

kubectl get pods,svc -n monitoring
kubectl get prometheus,alertmanager -n monitoring
kubectl get servicemonitor,podmonitor,prometheusrule -A

kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
kubectl port-forward -n monitoring svc/alertmanager-operated 9093:9093

ServiceMonitor

Tells Prometheus Operator which Services to scrape. Labels must match the Prometheus CR's serviceMonitorSelector.

yaml servicemonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-api
  namespace: app
  labels:
    release: prometheus   # Must match Prometheus serviceMonitorSelector.
spec:
  selector:
    matchLabels:
      app: web-api
  namespaceSelector:
    matchNames:
      - app
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics

PodMonitor

Scrape Pods directly when there's no Service — common for DaemonSets or hostNetwork pods.

yaml podmonitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: node-exporter-custom
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: node-exporter
  podMetricsEndpoints:
    - port: metrics
      interval: 30s

PromQL Starters

promql queries.promql

# CPU usage by pod (cores).
sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (namespace,pod)

# Memory working set by pod.
sum(container_memory_working_set_bytes{container!="",pod!=""}) by (namespace,pod)

# Pod restart rate (15m).
sum(increase(kube_pod_container_status_restarts_total[15m])) by (namespace,pod,container)

# Pending pods.
sum(kube_pod_status_phase{phase="Pending"}) by (namespace)

# HTTP 5xx rate (app must expose http_requests_total).
sum(rate(http_requests_total{status=~"5.."}[5m])) by (namespace,service)

# Node disk pressure.
kube_node_status_condition{condition="DiskPressure",status="true"}

PrometheusRule & Alertmanager

yaml prometheusrule.yaml

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: web-api-alerts
  namespace: app
  labels:
    release: prometheus
spec:
  groups:
    - name: web-api
      rules:
        - alert: HighErrorRate
          expr: |
            sum(rate(http_requests_total{job="web-api",status=~"5.."}[5m]))
            / sum(rate(http_requests_total{job="web-api"}[5m])) > 0.05
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "High 5xx rate on web-api"
            description: "Error rate above 5% for 5 minutes."

bash alertmanager.sh

# Check firing alerts in Prometheus UI: /alerts
# Or via Alertmanager UI: /#/alerts (port-forward 9093).

kubectl get prometheusrule -A
kubectl describe prometheusrule web-api-alerts -n app

Container Logs

bash logs.sh

kubectl logs <pod> -n <ns> -c <container> --tail=200
kubectl logs <pod> -n <ns> -c <container> --previous   # Crashed container.
kubectl logs -n <ns> -l app=<label> --all-containers --tail=100

Gotchas

!ServiceMonitor labels must match Prometheus serviceMonitorSelector — missing label = no scrape.
!High cardinality — avoid unbounded label values (user IDs, URLs) in custom metrics.
!rate() needs range vector — always use [5m] or similar with rate() and increase().
!Retention — default Prometheus retention is ~15 days; use Thanos for long-term storage.