TL;DR

Capacity planning prevents surprises. Inventory what you have, measure what you're using, project growth, and keep enough headroom to handle one AZ failure plus a traffic spike. Use VPA recommendations for right-sizing and autoscaler metrics for trending.

Resource Inventory

These commands give you a cluster-wide picture of allocatable capacity versus what is actually requested by workloads.

bashinventory.sh
# Allocatable capacity per node
kubectl get nodes -o custom-columns=\
"NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory,\
OS:.status.nodeInfo.operatingSystem,INSTANCE:.metadata.labels.node\.kubernetes\.io/instance-type"

# Total cluster allocatable
kubectl get nodes -o json | jq -r '
  .items[] | select(.spec.taints == null or (.spec.taints[] | .effect) != "NoSchedule") |
  [.status.allocatable.cpu, .status.allocatable.memory]' 2>/dev/null

# Requested vs allocatable per node (shows headroom)
kubectl describe nodes | grep -A 5 "Allocated resources"

# Per-namespace resource quota
kubectl get resourcequota -A -o wide

# Top consumers (requires metrics-server)
kubectl top nodes
kubectl top pods -A --sort-by=memory | head -20
kubectl top pods -A --sort-by=cpu    | head -20

Utilisation PromQL

These queries express cluster and namespace utilisation as a percentage of allocatable capacity; track them in a Grafana dashboard and alert when sustained utilisation exceeds your headroom threshold.

bashutilisation.promql
# CPU request utilisation (% of allocatable requested)
sum(kube_pod_container_resource_requests{resource="cpu"})
  / sum(kube_node_status_allocatable{resource="cpu"})

# Memory request utilisation
sum(kube_pod_container_resource_requests{resource="memory"})
  / sum(kube_node_status_allocatable{resource="memory"})

# Actual CPU usage vs requests (right-sizing signal; >1 = over-requested)
sum(rate(container_cpu_usage_seconds_total[5m]))
  / sum(kube_pod_container_resource_requests{resource="cpu"})

# Per-namespace CPU request utilisation
sum by (namespace)(kube_pod_container_resource_requests{resource="cpu"})

# Node CPU utilisation (actual usage)
1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))

# Pods per node (scheduling density)
count by (node)(kube_pod_info{node!=""})

Headroom Rules

These thresholds represent safe operating limits; crossing them means you're one surge or node failure away from degradation.

ResourceWarning thresholdAction
CPU requests / allocatable> 70%Add nodes or right-size workloads with VPA
Memory requests / allocatable> 80%Add nodes — OOM risk increases above this point
Node count headroom< 1 spare node per AZScale up; can't drain a node for maintenance
PVC usage> 80%Expand PVC or clean up data
etcd database size> 6 GB (default 8 GB max)Compact etcd or increase --quota-backend-bytes

Node Capacity Math

Use this calculation to determine how many pods a node can fit given your average pod resource profile and the Kubernetes system overhead reservation.

bashnode-capacity-math.txt
# Example: m5.xlarge (4 vCPU, 16 GB RAM) on EKS
#
# CPU allocatable (kube-reserved + system-reserved ~0.11 core typical):
#   4000m - 110m (reserved) ≈ 3890m
#   Average pod request = 100m → 38 pods by CPU
#
# Memory allocatable (overhead ~11% typical on EKS):
#   16 GB * 0.89 ≈ 14.2 GB
#   Average pod request = 256Mi → 56 pods by memory
#   → CPU is the bottleneck at 38 pods / node
#
# Max pods EKS also limits by ENI + IP count:
#   m5.xlarge: 3 ENIs * 15 IPs - 3 = 42 secondary IPs (pods)
#   → Actual max ≈ min(38, 42) = 38 pods / node
#
# Nodes needed for 200 pods with 30% headroom:
#   Pods with headroom = 200 / 0.70 ≈ 286
#   Nodes needed = ceil(286 / 38) = 8 nodes

# kubectl check actual allocatable
kubectl get node <node> -o jsonpath='{.status.allocatable}' | python3 -m json.tool

Right-sizing with VPA

VPA in recommendation mode (no auto-update) is the fastest way to identify over- and under-provisioned workloads without risk; review recommendations monthly.

bashvpa-recommendations.sh
# List VPA recommendations for all namespaces
kubectl get vpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{range .status.recommendation.containerRecommendations[*]}  {.containerName}: cpu={.target.cpu} mem={.target.memory}{"\n"}{end}{end}'

# Create a VPA in recommendation-only mode (safe — no automatic changes)
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-app
  updatePolicy:
    updateMode: "Off"   # Off = recommendation only; Auto or Recreate = live updates
EOF