Learning Path & Field Workflow — K8s SRE Reference

How to use

Read this page when you need direction. Use the topic pages when you need exact commands, YAML, Helm values, Terraform snippets, or troubleshooting checks. The goal is to move from mental model to working implementation without assuming you already know every moving part.

Recommended Study Order

Cluster Mental Model

Start with the control plane, worker nodes, API flow, etcd, controllers, kubelet, and pod lifecycle. You should be able to explain where a request goes after kubectl apply.

Architecture · Internals · Worker nodes · Pod lifecycle

Workloads From Scratch

Learn how apps actually run: Deployments, ReplicaSets, pods, probes, resources, ConfigMaps, Secrets, Jobs, DaemonSets, and StatefulSets.

Deployments · ConfigMaps and Secrets · StatefulSets

Networking and Access

Move from pod-to-pod traffic to Services, DNS, Ingress, load balancers, network policy, and CNI behavior. Most client incidents involve this layer sooner or later.

Services · CoreDNS · Ingress · Network policies

Security, Storage, and Operations

Learn the parts that make client environments strict: RBAC, service accounts, pod security, certificates, storage classes, PVCs, node maintenance, and production troubleshooting.

RBAC · Pod security · PV/PVC · Drain and taint

Delivery and Platform Automation

Package apps with Helm, deploy with ArgoCD or Flux, run CI/CD checks, and provision durable infrastructure with Terraform.

Helm · ArgoCD · CI/CD · Terraform

From-Scratch Practice Build

Use this sequence to turn the reference into hands-on muscle memory. Each phase should leave behind a working artifact you can explain in an interview or adapt at a client site.

Phase	Build	You should be able to explain
Local cluster	Create a kind, minikube, k3d, or lab cluster.	Control plane vs worker node, kubeconfig, contexts, namespaces.
Basic app	Deploy an app with Deployment, Service, ConfigMap, Secret, probes, and resource requests.	Why the pod starts, how rollout works, and how Service routes traffic.
Ingress	Add an ingress controller and expose the app through HTTP.	Ingress object vs controller, DNS, load balancer, TLS handoff.
Security	Add a ServiceAccount, Role, RoleBinding, NetworkPolicy, and pod security context.	Least privilege, namespace scope, traffic allow-lists, runtime constraints.
Helm	Convert manifests into a chart with environment-specific values.	Chart vs release, values precedence, template rendering, rollback.
GitOps	Deploy the Helm chart through ArgoCD or Flux.	Desired state, sync, drift, rollback, app-of-apps, promotion.
Observability	Install Prometheus and Grafana, then create basic app and cluster dashboards.	Metrics, labels, scraping, alert routing, dashboard debugging.
Cloud/IaC	Provision an EKS-style foundation with Terraform in a sandbox account.	State, plan/apply, modules, IAM, VPC, managed node groups, add-ons.

Client-Site Workflow

Identify ownership first: check whether the object is managed by Terraform, Helm, ArgoCD, Flux, or direct Kubernetes manifests.
Gather read-only evidence: context, namespace, events, logs, rollout history, recent Git changes, and relevant dashboards.
Use the topic page: read the TL;DR, copy the safest commands, then compare symptoms with the troubleshooting section.
Prefer source-of-truth changes: update GitOps manifests, Helm values, or Terraform code unless the incident process permits an emergency patch.
Validate the result: watch rollout, confirm endpoints, check events, verify metrics, and record what changed.

bash incident-first-10-minutes.sh

kubectl config current-context
kubectl get ns

# Replace namespace and app labels before running in a client cluster.
NS=<namespace>
APP=<app-label>

kubectl get deploy,sts,ds,po,svc,ingress,pvc -n "$NS" -o wide
kubectl get events -n "$NS" --sort-by=.lastTimestamp | tail -n 40
kubectl describe deploy -n "$NS" -l app="$APP"
kubectl logs -n "$NS" -l app="$APP" --tail=100 --all-containers
kubectl rollout history deploy -n "$NS" -l app="$APP"

# Ownership clues.
kubectl get all,cm,secret,ingress,pvc -n "$NS" -l app="$APP" -o yaml | grep -E "app.kubernetes.io/managed-by|argocd|fluxcd|helm.sh"

What Every Topic Page Should Eventually Contain

1What it is: a concise explanation of the object or tool, with the problem it solves.
2How it works: the request flow, controller behavior, ownership boundaries, and common dependencies.
3Build from scratch: minimal working YAML, Helm, Terraform, or CI/CD example.
4Production version: safer defaults, resource limits, probes, security, HA, observability, and rollback.
5Troubleshooting: symptoms, commands, likely causes, and what to verify after a fix.