Autoscaling
Use HPA for horizontal pod scaling on CPU/memory/custom metrics. Use VPA (recommendation mode) for right-sizing pod resource requests. Use Cluster Autoscaler or Karpenter for node-level scaling. Use KEDA for event-driven scaling (queues, topics, cron). Don't run HPA and VPA auto-mode on the same Deployment — they conflict.
Autoscaling Layers
HPA — Horizontal Pod Autoscaler
HPA adds or removes pod replicas based on observed metrics; set CPU request-based HPA first, then layer in custom metrics (RPS, queue depth) once the baseline is stable.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-service-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # scale up when avg CPU > 70% of request
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60 # wait 60s before consecutive scale-up
policies:
- type: Pods
value: 4
periodSeconds: 60 # add at most 4 pods per minute
scaleDown:
stabilizationWindowSeconds: 300 # wait 5 min before scaling down (avoids flapping)kubectl get hpa -n production
kubectl describe hpa my-service-hpa -n production
# Shows: current metric value, desired replicas, last scale time, events
# Watch HPA in real time
kubectl get hpa my-service-hpa -n production -w
# Trigger scale manually (for testing — don't leave set)
kubectl patch hpa my-service-hpa -n production --patch '{"spec":{"minReplicas":5}}'VPA — Vertical Pod Autoscaler
Start VPA in Off mode to collect recommendations without making changes; review for 7–14 days before switching to Auto to let it observe peak and off-peak traffic patterns.
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-service-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-service
updatePolicy:
updateMode: "Off" # Off | Initial | Recreate | Auto
resourcePolicy:
containerPolicies:
- containerName: my-service
minAllowed:
cpu: 50m
memory: 64Mi
maxAllowed:
cpu: "4"
memory: 4Gi
controlledResources: [cpu, memory]# Read VPA recommendations
kubectl get vpa my-service-vpa -n production -o json | \
jq '.status.recommendation.containerRecommendations[]'
# Shows: lowerBound, target, upperBound for CPU and memoryCluster Autoscaler
Cluster Autoscaler (CA) watches for unschedulable pods and adds nodes from configured node groups; it removes underutilised nodes after a quiet period, but only if pods can be safely rescheduled elsewhere.
# Check Cluster Autoscaler status
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml
# Watch CA logs for scale decisions
kubectl logs -n kube-system -l app=cluster-autoscaler -f | grep -i "scale\|node\|pending"
# EKS: annotation for CA-managed node groups
# Ensure your node group has these tags:
# k8s.io/cluster-autoscaler/enabled = true
# k8s.io/cluster-autoscaler/<cluster-name> = owned
# Key CA flags to know:
# --scale-down-utilization-threshold=0.5 (remove node if utilisation < 50%)
# --scale-down-delay-after-add=10m (wait 10m after adding a node before scaling down)
# --skip-nodes-with-local-storage=false (allow draining nodes with emptyDir)
# --balance-similar-node-groups=true (distribute evenly across similar groups)Karpenter
Karpenter provisions nodes directly (without node groups) based on pod requirements; it selects the cheapest instance type that fits, supports Spot natively, and consolidates underutilised nodes automatically.
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
name: default
spec:
template:
spec:
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: [spot, on-demand] # prefer spot; fall back to on-demand
- key: kubernetes.io/arch
operator: In
values: [amd64]
- key: node.kubernetes.io/instance-type
operator: In
values: [m5.large, m5.xlarge, m5.2xlarge, m6i.large, m6i.xlarge]
nodeClassRef:
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
name: default
limits:
cpu: 1000 # max 1000 vCPU across all Karpenter nodes
disruption:
consolidationPolicy: WhenUnderutilized
consolidateAfter: 30s # consolidate quickly to save costkubectl get nodeclaims -A # Karpenter-managed nodes
kubectl get nodepools # pool definitions and limits
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f | grep -i "launch\|terminat"KEDA — Event-driven Autoscaling
KEDA scales Deployments to zero and back based on event sources like SQS queue depth, Kafka lag, or a Prometheus query — ideal for batch processors and async workers.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaledobject
namespace: production
spec:
scaleTargetRef:
name: sqs-worker
minReplicaCount: 0 # scale to zero when queue is empty
maxReplicaCount: 50
pollingInterval: 15 # check queue depth every 15 seconds
cooldownPeriod: 300 # wait 5 min before scaling to zero
triggers:
- type: aws-sqs-queue
metadata:
queueURL: https://sqs.us-east-1.amazonaws.com/123456/my-queue
queueLength: "10" # scale up if more than 10 messages per replica
awsRegion: us-east-1
identityOwner: pod # use pod's IRSA roleTroubleshooting
| Symptom | Check | Likely cause |
|---|---|---|
| HPA not scaling up | kubectl describe hpa — check Conditions | metrics-server not installed, no CPU requests set, min already reached |
| HPA stuck at max | Check actual CPU usage vs limit | CPU limit too low → throttled → high utilisation; CPU request missing |
| Pods pending, no nodes added | kubectl describe pod — check events; CA logs | Node group max reached, Karpenter limits hit, Spot unavailable |
| VPA restarts pods too often | Check VPA mode | Mode is Recreate/Auto; switch to Off for production critical workloads |
| Cluster Autoscaler won't scale down | kubectl get configmap cluster-autoscaler-status | PDB blocking drain, local storage on node, cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation |