Cloud Cost Optimization — K8s SRE Reference

TL;DR

The biggest Kubernetes cost leaks are over-provisioned resources (requests much higher than actual usage) and idle nodes. Start by measuring actual usage with kubectl top + Prometheus, then right-size requests, use spot/preemptible nodes for fault-tolerant workloads, and enable cluster autoscaler to terminate idle nodes.

Find Waste First

Before optimizing, measure. These PromQL queries show which workloads are over-requesting CPU and memory relative to their actual usage — the biggest source of wasted node capacity.

bashfind-waste.sh

# kubectl top: live utilisation
kubectl top nodes
kubectl top pods -A --sort-by cpu

# PromQL: CPU request vs actual usage (ratio > 3 means over-provisioned)
# sum(kube_pod_container_resource_requests{resource="cpu"}) by (namespace, pod)
# /
# sum(rate(container_cpu_usage_seconds_total[30m])) by (namespace, pod)

# PromQL: pods with zero CPU usage over 24h (idle/dead workloads)
# sum(rate(container_cpu_usage_seconds_total[24h])) by (namespace, pod) == 0

# VPA recommendation (shows what VPA would set limits/requests to)
kubectl get vpa -A
kubectl describe vpa <name> -n <ns>

# Goldilocks: runs VPA in recommendation mode per namespace
# https://github.com/FairwindsOps/goldilocks
kubectl goldilocks dashboard   # open on port-forward 8080

Right-sizing Resource Requests

Set requests to the p95 of actual usage plus a 20% buffer. Keep limits at 2–3x requests for CPU (burstable), and equal to requests for memory to avoid OOM on memory-sensitive workloads.

yamlright-sized-resources.yaml

# VPA in auto mode: automatically adjusts requests without manual tuning
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: myapp-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  updatePolicy:
    updateMode: "Auto"   # use "Off" to only get recommendations without applying
  resourcePolicy:
    containerPolicies:
    - containerName: myapp
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: 2
        memory: 2Gi

Spot / Preemptible Nodes

Use spot instances for stateless, fault-tolerant workloads like batch jobs, CI runners, and horizontally-scaled services — they cost 60–90% less than on-demand. Always run spot pods with Pod Disruption Budgets and use --spot alongside on-demand nodes in a mixed node group.

yamlspot-tolerations.yaml

# Schedule a Deployment to prefer spot nodes (with on-demand fallback)
spec:
  template:
    spec:
      tolerations:
      - key: "kubernetes.azure.com/scalesetpriority"   # AKS spot taint
        operator: "Equal"
        value: "spot"
        effect: "NoSchedule"
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: "kubernetes.azure.com/scalesetpriority"
                operator: In
                values: ["spot"]

Karpenter for Cost-aware Provisioning

Karpenter replaces the cluster autoscaler — it provisions nodes on demand per pending Pod spec, picking the cheapest instance type that fits. Use spot + on-demand with disruption consolidation to continuously right-size nodes.

yamlnodepool.yaml

apiVersion: karpenter.sh/v1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      nodeClassRef:
        group: karpenter.k8s.aws
        kind: EC2NodeClass
        name: default
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: ["spot", "on-demand"]   # prefer spot, fall back to on-demand
      - key: kubernetes.io/arch
        operator: In
        values: ["amd64"]
  disruption:
    consolidationPolicy: WhenEmptyOrUnderutilized
    consolidateAfter: 1m    # consolidate underutilized nodes after 1 minute idle

Namespace Quotas

ResourceQuotas and LimitRanges prevent any single team or namespace from consuming unbounded cluster resources — essential in multi-tenant clusters.

yamlquota.yaml

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-quota
  namespace: team-a
spec:
  hard:
    requests.cpu: "20"
    requests.memory: 40Gi
    limits.cpu: "40"
    limits.memory: 80Gi
    pods: "100"
    services.loadbalancers: "3"
---
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-a
spec:
  limits:
  - type: Container
    default:          # applied when no limits are set
      cpu: 500m
      memory: 512Mi
    defaultRequest:   # applied when no requests are set
      cpu: 100m
      memory: 128Mi

Cost Optimisation Checklist

Use spot/preemptible nodes for CI, batch, and stateless horizontally-scaled services.
Enable cluster autoscaler or Karpenter with consolidation so idle nodes are terminated.
Set resource requests accurately — over-requesting wastes node capacity.
Deploy VPA in recommendation mode first; move to Auto after validating suggestions.
Set LimitRange defaults so pods without requests still count against quota.
Use Reserved Instances / Savings Plans / CUDs for the stable baseline node count.
Delete orphaned load balancers, persistent volumes, and snapshots.
Run workloads in fewer, larger nodes (bin-packing efficiency) vs many small nodes.
Use tools like Kubecost or OpenCost to attribute costs per namespace/team.