Cloud Cost Optimization
The biggest Kubernetes cost leaks are over-provisioned resources (requests much higher than actual usage) and idle nodes. Start by measuring actual usage with kubectl top + Prometheus, then right-size requests, use spot/preemptible nodes for fault-tolerant workloads, and enable cluster autoscaler to terminate idle nodes.
Find Waste First
Before optimizing, measure. These PromQL queries show which workloads are over-requesting CPU and memory relative to their actual usage — the biggest source of wasted node capacity.
# kubectl top: live utilisation
kubectl top nodes
kubectl top pods -A --sort-by cpu
# PromQL: CPU request vs actual usage (ratio > 3 means over-provisioned)
# sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod)
# /
# sum(rate(container_cpu_usage_seconds_total[30m])) by (namespace, pod)
# PromQL: pods with zero CPU usage over 24h (idle/dead workloads)
# sum(rate(container_cpu_usage_seconds_total[24h])) by (namespace, pod) == 0
# VPA recommendation (shows what VPA would set limits/requests to)
kubectl get vpa -A
kubectl describe vpa <name> -n <ns>
# Goldilocks: runs VPA in recommendation mode per namespace
# https://github.com/FairwindsOps/goldilocks
kubectl goldilocks dashboard # open on port-forward 8080Right-sizing Resource Requests
Set requests to the p95 of actual usage plus a 20% buffer. Keep limits at 2–3x requests for CPU (burstable), and equal to requests for memory to avoid OOM on memory-sensitive workloads.
# VPA in auto mode: automatically adjusts requests without manual tuning
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: myapp-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
updatePolicy:
updateMode: "Auto" # use "Off" to only get recommendations without applying
resourcePolicy:
containerPolicies:
- containerName: myapp
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: 2
memory: 2GiSpot / Preemptible Nodes
Use spot instances for stateless, fault-tolerant workloads like batch jobs, CI runners, and horizontally-scaled services — they cost 60–90% less than on-demand. Always run spot pods with Pod Disruption Budgets and use --spot alongside on-demand nodes in a mixed node group.
# Schedule a Deployment to prefer spot nodes (with on-demand fallback)
spec:
template:
spec:
tolerations:
- key: "kubernetes.azure.com/scalesetpriority" # AKS spot taint
operator: "Equal"
value: "spot"
effect: "NoSchedule"
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: "kubernetes.azure.com/scalesetpriority"
operator: In
values: ["spot"]Karpenter for Cost-aware Provisioning
Karpenter replaces the cluster autoscaler — it provisions nodes on demand per pending Pod spec, picking the cheapest instance type that fits. Use spot + on-demand with disruption consolidation to continuously right-size nodes.
apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
nodeClassRef:
group: karpenter.k8s.aws
kind: EC2NodeClass
name: default
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot", "on-demand"] # prefer spot, fall back to on-demand
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
disruption:
consolidationPolicy: WhenEmptyOrUnderutilized
consolidateAfter: 1m # consolidate underutilized nodes after 1 minute idleNamespace Quotas
ResourceQuotas and LimitRanges prevent any single team or namespace from consuming unbounded cluster resources — essential in multi-tenant clusters.
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-quota
namespace: team-a
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
pods: "100"
services.loadbalancers: "3"
---
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-a
spec:
limits:
- type: Container
default: # applied when no limits are set
cpu: 500m
memory: 512Mi
defaultRequest: # applied when no requests are set
cpu: 100m
memory: 128MiCost Optimisation Checklist
- Use spot/preemptible nodes for CI, batch, and stateless horizontally-scaled services.
- Enable cluster autoscaler or Karpenter with consolidation so idle nodes are terminated.
- Set resource requests accurately — over-requesting wastes node capacity.
- Deploy VPA in recommendation mode first; move to Auto after validating suggestions.
- Set
LimitRangedefaults so pods without requests still count against quota. - Use Reserved Instances / Savings Plans / CUDs for the stable baseline node count.
- Delete orphaned load balancers, persistent volumes, and snapshots.
- Run workloads in fewer, larger nodes (bin-packing efficiency) vs many small nodes.
- Use tools like Kubecost or OpenCost to attribute costs per namespace/team.