Capacity Planning
Capacity planning prevents surprises. Inventory what you have, measure what you're using, project growth, and keep enough headroom to handle one AZ failure plus a traffic spike. Use VPA recommendations for right-sizing and autoscaler metrics for trending.
Resource Inventory
These commands give you a cluster-wide picture of allocatable capacity versus what is actually requested by workloads.
# Allocatable capacity per node
kubectl get nodes -o custom-columns=\
"NAME:.metadata.name,CPU:.status.allocatable.cpu,MEM:.status.allocatable.memory,\
OS:.status.nodeInfo.operatingSystem,INSTANCE:.metadata.labels.node\.kubernetes\.io/instance-type"
# Total cluster allocatable
kubectl get nodes -o json | jq -r '
.items[] | select(.spec.taints == null or (.spec.taints[] | .effect) != "NoSchedule") |
[.status.allocatable.cpu, .status.allocatable.memory]' 2>/dev/null
# Requested vs allocatable per node (shows headroom)
kubectl describe nodes | grep -A 5 "Allocated resources"
# Per-namespace resource quota
kubectl get resourcequota -A -o wide
# Top consumers (requires metrics-server)
kubectl top nodes
kubectl top pods -A --sort-by=memory | head -20
kubectl top pods -A --sort-by=cpu | head -20Utilisation PromQL
These queries express cluster and namespace utilisation as a percentage of allocatable capacity; track them in a Grafana dashboard and alert when sustained utilisation exceeds your headroom threshold.
# CPU request utilisation (% of allocatable requested)
sum(kube_pod_container_resource_requests{resource="cpu"})
/ sum(kube_node_status_allocatable{resource="cpu"})
# Memory request utilisation
sum(kube_pod_container_resource_requests{resource="memory"})
/ sum(kube_node_status_allocatable{resource="memory"})
# Actual CPU usage vs requests (right-sizing signal; >1 = over-requested)
sum(rate(container_cpu_usage_seconds_total[5m]))
/ sum(kube_pod_container_resource_requests{resource="cpu"})
# Per-namespace CPU request utilisation
sum by (namespace)(kube_pod_container_resource_requests{resource="cpu"})
# Node CPU utilisation (actual usage)
1 - avg(rate(node_cpu_seconds_total{mode="idle"}[5m]))
# Pods per node (scheduling density)
count by (node)(kube_pod_info{node!=""})Headroom Rules
These thresholds represent safe operating limits; crossing them means you're one surge or node failure away from degradation.
| Resource | Warning threshold | Action |
|---|---|---|
| CPU requests / allocatable | > 70% | Add nodes or right-size workloads with VPA |
| Memory requests / allocatable | > 80% | Add nodes — OOM risk increases above this point |
| Node count headroom | < 1 spare node per AZ | Scale up; can't drain a node for maintenance |
| PVC usage | > 80% | Expand PVC or clean up data |
| etcd database size | > 6 GB (default 8 GB max) | Compact etcd or increase --quota-backend-bytes |
Node Capacity Math
Use this calculation to determine how many pods a node can fit given your average pod resource profile and the Kubernetes system overhead reservation.
# Example: m5.xlarge (4 vCPU, 16 GB RAM) on EKS
#
# CPU allocatable (kube-reserved + system-reserved ~0.11 core typical):
# 4000m - 110m (reserved) ≈ 3890m
# Average pod request = 100m → 38 pods by CPU
#
# Memory allocatable (overhead ~11% typical on EKS):
# 16 GB * 0.89 ≈ 14.2 GB
# Average pod request = 256Mi → 56 pods by memory
# → CPU is the bottleneck at 38 pods / node
#
# Max pods EKS also limits by ENI + IP count:
# m5.xlarge: 3 ENIs * 15 IPs - 3 = 42 secondary IPs (pods)
# → Actual max ≈ min(38, 42) = 38 pods / node
#
# Nodes needed for 200 pods with 30% headroom:
# Pods with headroom = 200 / 0.70 ≈ 286
# Nodes needed = ceil(286 / 38) = 8 nodes
# kubectl check actual allocatable
kubectl get node <node> -o jsonpath='{.status.allocatable}' | python3 -m json.toolRight-sizing with VPA
VPA in recommendation mode (no auto-update) is the fastest way to identify over- and under-provisioned workloads without risk; review recommendations monthly.
# List VPA recommendations for all namespaces
kubectl get vpa -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{range .status.recommendation.containerRecommendations[*]} {.containerName}: cpu={.target.cpu} mem={.target.memory}{"\n"}{end}{end}'
# Create a VPA in recommendation-only mode (safe — no automatic changes)
kubectl apply -f - <<EOF
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off" # Off = recommendation only; Auto or Recreate = live updates
EOF