TL;DR

Use HPA for horizontal pod scaling on CPU/memory/custom metrics. Use VPA (recommendation mode) for right-sizing pod resource requests. Use Cluster Autoscaler or Karpenter for node-level scaling. Use KEDA for event-driven scaling (queues, topics, cron). Don't run HPA and VPA auto-mode on the same Deployment — they conflict.

Autoscaling Layers

Traffic / Events HPA Replicas ↑↓ KEDA Event-driven scale VPA CPU/Mem requests Cluster Autoscaler Node groups Karpenter Just-in-time nodes

HPA — Horizontal Pod Autoscaler

HPA adds or removes pod replicas based on observed metrics; set CPU request-based HPA first, then layer in custom metrics (RPS, queue depth) once the baseline is stable.

yamlhpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: my-service-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  minReplicas: 3
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70    # scale up when avg CPU > 70% of request

  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60    # wait 60s before consecutive scale-up
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60             # add at most 4 pods per minute
    scaleDown:
      stabilizationWindowSeconds: 300   # wait 5 min before scaling down (avoids flapping)
bashhpa-debug.sh
kubectl get hpa -n production
kubectl describe hpa my-service-hpa -n production
# Shows: current metric value, desired replicas, last scale time, events

# Watch HPA in real time
kubectl get hpa my-service-hpa -n production -w

# Trigger scale manually (for testing — don't leave set)
kubectl patch hpa my-service-hpa -n production --patch '{"spec":{"minReplicas":5}}'

VPA — Vertical Pod Autoscaler

Start VPA in Off mode to collect recommendations without making changes; review for 7–14 days before switching to Auto to let it observe peak and off-peak traffic patterns.

yamlvpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: my-service-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: my-service
  updatePolicy:
    updateMode: "Off"   # Off | Initial | Recreate | Auto
  resourcePolicy:
    containerPolicies:
    - containerName: my-service
      minAllowed:
        cpu: 50m
        memory: 64Mi
      maxAllowed:
        cpu: "4"
        memory: 4Gi
      controlledResources: [cpu, memory]
bashvpa-check.sh
# Read VPA recommendations
kubectl get vpa my-service-vpa -n production -o json | \
  jq '.status.recommendation.containerRecommendations[]'
# Shows: lowerBound, target, upperBound for CPU and memory

Cluster Autoscaler

Cluster Autoscaler (CA) watches for unschedulable pods and adds nodes from configured node groups; it removes underutilised nodes after a quiet period, but only if pods can be safely rescheduled elsewhere.

bashcluster-autoscaler.sh
# Check Cluster Autoscaler status
kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

# Watch CA logs for scale decisions
kubectl logs -n kube-system -l app=cluster-autoscaler -f | grep -i "scale\|node\|pending"

# EKS: annotation for CA-managed node groups
# Ensure your node group has these tags:
# k8s.io/cluster-autoscaler/enabled = true
# k8s.io/cluster-autoscaler/<cluster-name> = owned

# Key CA flags to know:
# --scale-down-utilization-threshold=0.5   (remove node if utilisation < 50%)
# --scale-down-delay-after-add=10m         (wait 10m after adding a node before scaling down)
# --skip-nodes-with-local-storage=false    (allow draining nodes with emptyDir)
# --balance-similar-node-groups=true       (distribute evenly across similar groups)

Karpenter

Karpenter provisions nodes directly (without node groups) based on pod requirements; it selects the cheapest instance type that fits, supports Spot natively, and consolidates underutilised nodes automatically.

yamlkarpenter-nodepool.yaml
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: default
spec:
  template:
    spec:
      requirements:
      - key: karpenter.sh/capacity-type
        operator: In
        values: [spot, on-demand]      # prefer spot; fall back to on-demand
      - key: kubernetes.io/arch
        operator: In
        values: [amd64]
      - key: node.kubernetes.io/instance-type
        operator: In
        values: [m5.large, m5.xlarge, m5.2xlarge, m6i.large, m6i.xlarge]
      nodeClassRef:
        apiVersion: karpenter.k8s.aws/v1beta1
        kind: EC2NodeClass
        name: default
  limits:
    cpu: 1000                    # max 1000 vCPU across all Karpenter nodes
  disruption:
    consolidationPolicy: WhenUnderutilized
    consolidateAfter: 30s        # consolidate quickly to save cost
bashkarpenter-debug.sh
kubectl get nodeclaims -A                   # Karpenter-managed nodes
kubectl get nodepools                       # pool definitions and limits
kubectl logs -n kube-system -l app.kubernetes.io/name=karpenter -f | grep -i "launch\|terminat"

KEDA — Event-driven Autoscaling

KEDA scales Deployments to zero and back based on event sources like SQS queue depth, Kafka lag, or a Prometheus query — ideal for batch processors and async workers.

yamlkeda-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaledobject
  namespace: production
spec:
  scaleTargetRef:
    name: sqs-worker
  minReplicaCount: 0            # scale to zero when queue is empty
  maxReplicaCount: 50
  pollingInterval: 15           # check queue depth every 15 seconds
  cooldownPeriod: 300           # wait 5 min before scaling to zero
  triggers:
  - type: aws-sqs-queue
    metadata:
      queueURL: https://sqs.us-east-1.amazonaws.com/123456/my-queue
      queueLength: "10"         # scale up if more than 10 messages per replica
      awsRegion: us-east-1
      identityOwner: pod        # use pod's IRSA role

Troubleshooting

SymptomCheckLikely cause
HPA not scaling upkubectl describe hpa — check Conditionsmetrics-server not installed, no CPU requests set, min already reached
HPA stuck at maxCheck actual CPU usage vs limitCPU limit too low → throttled → high utilisation; CPU request missing
Pods pending, no nodes addedkubectl describe pod — check events; CA logsNode group max reached, Karpenter limits hit, Spot unavailable
VPA restarts pods too oftenCheck VPA modeMode is Recreate/Auto; switch to Off for production critical workloads
Cluster Autoscaler won't scale downkubectl get configmap cluster-autoscaler-statusPDB blocking drain, local storage on node, cluster-autoscaler.kubernetes.io/safe-to-evict: "false" annotation