Grafana
TL;DR
Grafana visualizes metrics from Prometheus (and other datasources). On contract, you'll use pre-built dashboards, Explore for ad-hoc PromQL, and provisioned datasources/dashboards from Git or Helm values.
Access & Port-Forward
bash
grafana-access.sh
kubectl get pods,svc -n monitoring -l app.kubernetes.io/name=grafana
kubectl port-forward -n monitoring svc/prometheus-grafana 3000:80
# Default admin password (kube-prometheus-stack).
kubectl get secret -n monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 -d
Explore (Ad-hoc Queries)
Use Explore when investigating incidents — run PromQL without editing dashboards.
- Open Grafana → Explore → select Prometheus datasource.
- Run queries from the PromQL starters.
- Adjust time range; use "Split" to compare periods.
- Copy working queries into dashboard panels or incident notes.
Useful Built-in Dashboards
| Dashboard | What it shows |
|---|---|
| Kubernetes / Compute Resources / Namespace (Pods) | CPU/memory by pod |
| Kubernetes / Compute Resources / Node (Pods) | Per-node pod resource usage |
| Kubernetes / Networking / Namespace (Pods) | Network I/O by pod |
| Node Exporter / Nodes | Host-level CPU, disk, memory |
| Prometheus / Overview | Scrape health, TSDB stats |
Dashboard Variables
Common pattern for multi-env dashboards — filter by namespace, cluster, or deployment.
promql
variable-queries.promql
# Variable: namespace
label_values(kube_pod_info, namespace)
# Variable: pod (depends on $namespace)
label_values(kube_pod_info{namespace="$namespace"}, pod)
# Panel query using variables.
sum(rate(container_cpu_usage_seconds_total{namespace="$namespace",pod="$pod"}[5m]))
Provisioning via Helm
yaml
values-grafana.yaml
# kube-prometheus-stack subchart values.
grafana:
adminPassword: changeme
persistence:
enabled: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: default
folder: Platform
type: file
options:
path: /var/lib/grafana/dashboards
dashboards:
default:
node-exporter:
gnetId: 1860
revision: 27
datasource: Prometheus
Datasource Config
yaml
datasource.yaml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-operated.monitoring.svc:9090
isDefault: true
editable: false
- name: Thanos
type: prometheus
url: http://thanos-query.monitoring.svc:9090
editable: false
Gotchas
- No data in panels — check datasource URL, time range, and that Prometheus actually scrapes the metric.
- Dashboard UID conflicts — when provisioning from Git, ensure unique UIDs per dashboard.
- Thanos datasource — point Grafana at Thanos Query for long-range queries spanning multiple Prometheus instances.
- Don't edit prod dashboards in UI if they're Git-provisioned — changes will be overwritten on sync.