TL;DR

Classify failures: HTTP 403 from kube-apiserver → RBAC; timeouts or RST between Pods → NetworkPolicy/CNI path; Ingress TLS handshake errors → cert chain or SNI; admission rejections referencing limits → PSA; mounts referencing secrets → RBAC volume + object existence sync.

Diagnostic flow

API deny Net drop TLS err PSA / vol Layered clues Forbidden strings in audit → bindings · Drop with no RST → NetPol egress default deny · Mounted volume empty → secret/key drift.

Group symptoms before deep-diving a single subsystem.

SignalOftenNext artifact
forbidden ... cannot ... from kubectlRBAC or admission webhook denialkubectl auth can-i; API audit line for user
Works pod-to-ServiceIP, dies pod-to-pod by label subsetNamespace-scoped deny NetworkPolicyCNI policy dump; temporary allow-all for repro in sandbox
TLS alert unknown CA / handshake failureWrong secret ref, stale cert-manager certkubectl describe certificate, ingress TLS stanza
Pod rejected at create timePod Security Admission / baseline restrictedNamespace labels pod-security.kubernetes.io/enforce
MountVolume.Setup failed secretMissing Secret or revoked RBAC to read Secretkubectl describe pod; verify SA token projection if used

RBAC “forbidden”

bash rbac-can-i.sh
# Acting as CI ServiceAccount inside namespace app.
kubectl auth can-i create pods --as system:serviceaccount:app:cicd -n app

kubectl describe rolebinding -n team-a
kubectl describe clusterrolebinding cluster-admin | head -n 40

# Impersonate to reproduce user issue (requires elevated impersonate rights).

NetworkPolicy drops

bash netpol-debug.sh
kubectl get networkpolicy -A
kubectl describe networkpolicy allow-dns -n kube-system  # Typical baseline.

kubectl exec -it src-pod -n app -- wget -qO- --timeout=3 http://10.245.12.88:8080/health || echo fail
kubectl debug node/NODE_NAME -it --image=nicolaka/netshoot -- netstat -tan | grep 8080  # Requires debug policy.

# Cilium: hubble observe / cilium connectivity test (cluster-specific).

Cert & TLS issues

bash tls-chain.sh
ING=public-app.example.com
openssl s_client -connect "${ING}:443" -servername "${ING}" -brief </dev/null

kubectl describe ingress public-app -n edge | sed -n '/TLS/,/Rules/p'
kubectl get certificate -A
yaml ingress-tls-ref.yaml
spec:
  tls:
    - hosts:
        - public-app.example.com
      secretName: public-app-tls  # Names tls.crt + tls.key; full chain preferred.

PSA blocked pods

bash psa-ns.sh
kubectl label namespace workloads pod-security.kubernetes.io/enforce --list
kubectl get events -n workloads --field-selector reason=FailedCreate

kubectl apply --dry-run=server -f deploy.yaml -n workloads

Secret mount failures

bash secret-mount.sh
kubectl describe pod bad-mount -n app | sed -n '/Volumes:/,/QoS Class/p'

kubectl auth can-i get secret/db-creds --as system:serviceaccount:app:api -n app
kubectl get secret db-creds -n app -o jsonpath='{.data}' | wc -c  # Sanity: non-empty.

# Sync generators (ESO, CSI) lagging — compare secret resourceVersion timestamps.

Gotchas

  • !Aggressive aggregate ClusterRoles: edit does not imply Secret reads in all combos; CRDs add another authorization path.
  • !DNS requires explicit allow when a namespace default-deny egress exists; breakage looks like flaky external calls but is UDP 53 blocked.
  • !TLS secret keys must be canonical (tls.crt / tls.key); some tooling emits ca.crt only chain issues.
  • !Projected expired service account tokens: check audience and file rotation if mounts go empty mid-flight.