Prometheus & Metrics
TL;DR
Prometheus scrapes metrics on an interval and stores time series. On K8s, kube-prometheus-stack uses ServiceMonitor/PodMonitor CRs for discovery. Use kubectl top for quick checks, PromQL for trends, and PrometheusRule + Alertmanager for paging.
Metrics vs Logs vs Events
| Signal | Best for | Tool |
|---|---|---|
| Metrics | Trends, saturation, SLOs, alerting | Prometheus, Grafana |
| Logs | App errors, stack traces, request details | kubectl logs, Loki, CloudWatch |
| Events | K8s decisions — scheduling, probes, OOM | kubectl get events |
| Traces | Latency across services | Jaeger, Tempo, Zipkin |
Quick Checks
bash
resource-usage.sh
# Requires metrics-server.
kubectl top nodes
kubectl top pods -A --sort-by=cpu
kubectl top pods -A --sort-by=memory
kubectl get events -A --sort-by=.lastTimestamp | tail -n 50
kubectl get pods -A --field-selector=status.phase!=Running
kube-prometheus-stack
Most client clusters use the prometheus-community Helm chart (kube-prometheus-stack). It installs Prometheus Operator, Alertmanager, Grafana, node-exporter, and kube-state-metrics.
Prometheus Operator discovers targets via ServiceMonitor CRs — not static scrape configs.
bash
stack-checks.sh
kubectl get pods,svc -n monitoring
kubectl get prometheus,alertmanager -n monitoring
kubectl get servicemonitor,podmonitor,prometheusrule -A
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090
kubectl port-forward -n monitoring svc/alertmanager-operated 9093:9093
ServiceMonitor
Tells Prometheus Operator which Services to scrape. Labels must match the Prometheus CR's serviceMonitorSelector.
yaml
servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: web-api
namespace: app
labels:
release: prometheus # Must match Prometheus serviceMonitorSelector.
spec:
selector:
matchLabels:
app: web-api
namespaceSelector:
matchNames:
- app
endpoints:
- port: metrics
interval: 30s
path: /metrics
PodMonitor
Scrape Pods directly when there's no Service — common for DaemonSets or hostNetwork pods.
yaml
podmonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: node-exporter-custom
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: node-exporter
podMetricsEndpoints:
- port: metrics
interval: 30s
PromQL Starters
promql
queries.promql
# CPU usage by pod (cores).
sum(rate(container_cpu_usage_seconds_total{container!="",pod!=""}[5m])) by (namespace,pod)
# Memory working set by pod.
sum(container_memory_working_set_bytes{container!="",pod!=""}) by (namespace,pod)
# Pod restart rate (15m).
sum(increase(kube_pod_container_status_restarts_total[15m])) by (namespace,pod,container)
# Pending pods.
sum(kube_pod_status_phase{phase="Pending"}) by (namespace)
# HTTP 5xx rate (app must expose http_requests_total).
sum(rate(http_requests_total{status=~"5.."}[5m])) by (namespace,service)
# Node disk pressure.
kube_node_status_condition{condition="DiskPressure",status="true"}
PrometheusRule & Alertmanager
yaml
prometheusrule.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: web-api-alerts
namespace: app
labels:
release: prometheus
spec:
groups:
- name: web-api
rules:
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{job="web-api",status=~"5.."}[5m]))
/ sum(rate(http_requests_total{job="web-api"}[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High 5xx rate on web-api"
description: "Error rate above 5% for 5 minutes."
bash
alertmanager.sh
# Check firing alerts in Prometheus UI: /alerts
# Or via Alertmanager UI: /#/alerts (port-forward 9093).
kubectl get prometheusrule -A
kubectl describe prometheusrule web-api-alerts -n app
Container Logs
bash
logs.sh
kubectl logs <pod> -n <ns> -c <container> --tail=200
kubectl logs <pod> -n <ns> -c <container> --previous # Crashed container.
kubectl logs -n <ns> -l app=<label> --all-containers --tail=100
Gotchas
- ServiceMonitor labels must match Prometheus
serviceMonitorSelector— missing label = no scrape. - High cardinality — avoid unbounded label values (user IDs, URLs) in custom metrics.
- rate() needs range vector — always use
[5m]or similar withrate()andincrease(). - Retention — default Prometheus retention is ~15 days; use Thanos for long-term storage.