Logging & Monitoring Knowledge

Practice

Practice the observability basics: metrics-server, Prometheus, Grafana, log agents, events, alerting, audit logs, and metadata for telemetry.

Strong operations work separates logs, metrics, events, health probes, and alerts, then uses each signal for the job it does best.

Questions

What is metrics-server used for?

metrics-server provides recent CPU and memory resource metrics for Pods and nodes. It powers kubectl top, Horizontal Pod Autoscaler resource metrics, and lightweight dashboard views. It is not a long-term metrics store.

How do you install metrics-server?

For a quick lab install, apply the official manifest with kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml. For production, pin a version, review TLS settings, and install through a controlled release process.

How do you check Pod CPU and memory usage?

Use kubectl top pod in the current namespace, or kubectl top pod -A across namespaces. Use kubectl top pod <pod> --containers when you need per-container usage.

What is Prometheus used for in Kubernetes?

Prometheus scrapes metrics from nodes, Pods, kubelets, exporters, and control-plane components. It stores time-series data and evaluates recording and alerting rules.

What is Grafana used for?

Grafana visualizes metrics and logs from data sources such as Prometheus, Loki, Tempo, and cloud monitoring systems. It is commonly used for dashboards, ad hoc exploration, and on-call views.

What are Fluentd and Fluent Bit used for?

Fluentd and Fluent Bit collect, parse, enrich, and forward logs to backends such as Elasticsearch, Loki, S3, or vendor platforms. Fluent Bit is commonly used as a lightweight node agent.

What is the difference between logs and metrics?

Logs are textual or structured event records that explain what happened. Metrics are numeric time-series values that show trends, rates, saturation, and health. Logs are rich but expensive; metrics are compact and alert-friendly.

How do you view logs for a specific container?

Use kubectl logs <pod> -c <container>. Add --previous for the prior crashed container instance, and add -n <namespace> if the Pod is not in the current namespace.

How do you tail logs from a Pod in real time?

Use kubectl logs -f <pod>. For a specific container, add -c <container>. For recently emitted logs, combine with --since=10m or --tail=100.

What is a logging agent DaemonSet?

It is a logging agent Pod running on each node, usually reading container logs from paths such as /var/log/containers. DaemonSet placement lets every node forward local logs without modifying each application.

What is Loki?

Loki is a log aggregation system optimized around labels and efficient log storage. It is commonly paired with Grafana and Promtail or Fluent Bit for Kubernetes log querying.

What is the purpose of kube-state-metrics?

kube-state-metrics exports Kubernetes object state as Prometheus metrics, such as Deployment replicas, Node conditions, Pod phases, PVC status, and resource metadata. It complements runtime usage metrics.

What is the difference between node-exporter and kubelet metrics?

node-exporter exposes OS and host-level metrics such as CPU, disk, filesystem, and network. kubelet exposes node, Pod, container, cAdvisor, and kubelet health/runtime metrics.

How do you check events for a namespace?

Use kubectl get events -n <namespace>. Sorting helps when debugging: kubectl get events -n <namespace> --sort-by=.metadata.creationTimestamp. Events are short-lived signals, not durable logs.

What is an alerting rule in Prometheus?

An alerting rule evaluates a PromQL expression and fires when a condition is true for the configured duration. Examples include high restart rate, missing targets, high error rate, disk pressure, or SLO burn-rate conditions.

What is the purpose of Alertmanager?

Alertmanager receives alerts from Prometheus, groups and deduplicates them, applies silences and inhibition rules, and routes notifications to systems such as email, Slack, PagerDuty, or webhooks.

What is the difference between push and pull metrics?

Pull metrics are scraped by Prometheus from endpoints. Push metrics are sent by applications or jobs to a gateway. Kubernetes commonly favors pull-based scraping; Pushgateway is mostly for short-lived batch jobs with careful lifecycle handling.

What is the purpose of liveness and readiness probes in monitoring?

Probes provide health signals to the control plane. They are not metrics by themselves, but they affect restarts and Service endpoints and can be observed through Pod status, events, and exported metrics.

How do you enable audit logging in Kubernetes?

Configure an audit policy file and pass it to the API server with --audit-policy-file. You also need an audit backend such as log file flags or webhook configuration, depending on the cluster setup.

What is the purpose of the Downward API in monitoring?

The Downward API exposes Pod metadata such as name, namespace, labels, annotations, and node name to containers. Apps and agents can use that metadata to tag logs, metrics, and traces.

Questions

Keep going

See also