Kubernetes Cluster Upgrades
Upgrade one minor version at a time, control plane first then workers, and always take an etcd backup before starting. Managed clusters (EKS/AKS/GKE) handle control plane upgrades for you — but you still own node group upgrades.
Version Skew Rules
Kubernetes has strict version compatibility rules that govern what you can and cannot skip during an upgrade.
| Component | Max skew from API server | Rule |
|---|---|---|
| kubelet | -3 minor versions | Never ahead of API server; at most 3 minor versions behind |
| kube-proxy | -3 minor versions | Same as kubelet |
| kubectl | ±1 minor version | kubectl can be one version ahead or behind the server |
| Upgrade path | One minor version at a time | 1.27 → 1.28 → 1.29; never skip a minor version |
Pre-upgrade Checklist
Run through this before any production upgrade; skipping steps is how upgrades cause unplanned downtime.
# 1. Confirm current version
kubectl version --short
kubectl get nodes -o wide | awk '{print $1, $5}'
# 2. Read the changelog for your target version
# https://github.com/kubernetes/kubernetes/blob/master/CHANGELOG/CHANGELOG-1.XX.md
# 3. Check for deprecated APIs in use (before upgrading)
kubectl api-resources --verbs=list --namespaced -o name | \
xargs -I{} kubectl get {} -A --no-headers 2>/dev/null | wc -l
# Use: pluto detect-all-in-cluster (github.com/FairwindsOps/pluto)
pluto detect-all-in-cluster
# 4. Backup etcd
ETCDCTL_API=3 etcdctl snapshot save /var/backups/etcd/pre-upgrade-$(date +%Y%m%d).db \
--endpoints=https://127.0.0.1:2379 \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt \
--key=/etc/kubernetes/pki/etcd/healthcheck-client.key
# 5. Check all nodes are Ready and no critical pods are degraded
kubectl get nodes
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded
# 6. Check PodDisruptionBudgets — ensure PDBs allow at least one eviction
kubectl get pdb -A
kubectl get pdb -A -o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}: allowed={.status.disruptionsAllowed}{"\n"}{end}'kubeadm Upgrade (Self-managed)
kubeadm handles the control plane components; upgrade one control-plane node at a time, then drain and upgrade each worker node.
TARGET_VERSION=1.30.0-00 # replace with your target
# ─── On the first control-plane node ───
# Unhold and upgrade kubeadm
sudo apt-mark unhold kubeadm
sudo apt-get update
sudo apt-get install -y kubeadm=$TARGET_VERSION
sudo apt-mark hold kubeadm
# Plan and review changes
sudo kubeadm upgrade plan
# Apply the upgrade
sudo kubeadm upgrade apply v1.30.0 # strip the -00 suffix
# Upgrade kubelet and kubectl on THIS node
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet=$TARGET_VERSION kubectl=$TARGET_VERSION
sudo apt-mark hold kubelet kubectl
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# ─── On additional control-plane nodes ───
sudo kubeadm upgrade node
# Then upgrade kubelet and kubectl as above
# ─── On each worker node ───
# (run from a machine with kubectl access to the cluster)
NODE=worker-1
kubectl cordon "$NODE"
kubectl drain "$NODE" --ignore-daemonsets --delete-emptydir-data --grace-period=60
# SSH to the worker node, then:
sudo apt-mark unhold kubeadm kubelet kubectl
sudo apt-get update
sudo apt-get install -y kubeadm=$TARGET_VERSION kubelet=$TARGET_VERSION kubectl=$TARGET_VERSION
sudo apt-mark hold kubeadm kubelet kubectl
sudo kubeadm upgrade node
sudo systemctl daemon-reload
sudo systemctl restart kubelet
# Back on the admin machine:
kubectl uncordon "$NODE"
kubectl get nodes # confirm node shows new versionEKS Upgrades (Managed)
EKS upgrades control plane and worker node groups separately; always upgrade the control plane first, then update the managed node group launch template to match.
CLUSTER=my-cluster
REGION=us-east-1
TARGET=1.30
# 1. Upgrade control plane (AWS-managed; takes ~10-15 min)
aws eks update-cluster-version \
--name "$CLUSTER" --region "$REGION" \
--kubernetes-version "$TARGET"
# Wait for completion
aws eks wait cluster-active --name "$CLUSTER" --region "$REGION"
aws eks describe-cluster --name "$CLUSTER" --query "cluster.version" --output text
# 2. Upgrade managed add-ons (vpc-cni, coredns, kube-proxy)
for ADDON in vpc-cni coredns kube-proxy; do
LATEST=$(aws eks describe-addon-versions \
--addon-name "$ADDON" --kubernetes-version "$TARGET" \
--query "addons[0].addonVersions[0].addonVersion" --output text)
aws eks update-addon --cluster-name "$CLUSTER" --addon-name "$ADDON" \
--addon-version "$LATEST" --resolve-conflicts OVERWRITE
done
# 3. Upgrade managed node groups (replace AMI via rolling update)
aws eks update-nodegroup-version \
--cluster-name "$CLUSTER" \
--nodegroup-name workers \
--kubernetes-version "$TARGET"
# Monitor node group status
aws eks describe-nodegroup --cluster-name "$CLUSTER" \
--nodegroup-name workers --query "nodegroup.status"AKS & GKE Upgrades (Summary)
# ─── AKS ───
# Upgrade control plane
az aks upgrade --resource-group my-rg --name my-cluster --kubernetes-version 1.30 --control-plane-only
# Upgrade a node pool
az aks nodepool upgrade --resource-group my-rg --cluster-name my-cluster \
--name nodepool1 --kubernetes-version 1.30 --no-wait
# ─── GKE ───
# Upgrade control plane (can set to auto-upgrade in channel)
gcloud container clusters upgrade my-cluster --master --cluster-version 1.30 --zone us-central1-a
# Upgrade node pool
gcloud container clusters upgrade my-cluster \
--node-pool default-pool --cluster-version 1.30 --zone us-central1-aPost-upgrade Validation
Run these checks immediately after each upgrade phase (control plane, then each node group) before proceeding.
kubectl get nodes -o wide # all nodes Ready, correct version
kubectl get pods -A --field-selector=status.phase!=Running,status.phase!=Succeeded # no stuck pods
kubectl get cs 2>/dev/null # componentstatus (deprecated but useful sanity check)
# System pod health
kubectl get pods -n kube-system
# Check coredns and kube-proxy
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl get ds -n kube-system kube-proxy
# Validate DNS resolution
kubectl run dns-test --image=busybox:1.35 --restart=Never --rm -it -- \
nslookup kubernetes.default.svc.cluster.local
# Check all Deployments are at desired replicas
kubectl get deployments -A | awk '$3 != $4' # columns: NAMESPACE NAME READY UP-TO-DATEGotchas
- Never skip minor versions. 1.27 → 1.29 is unsupported and may corrupt etcd or leave components in an inconsistent state.
- Deprecated APIs: check for removed APIs with
plutobefore upgrading — e.g., batch/v1beta1 CronJob was removed in 1.25. - PodDisruptionBudgets: PDBs that block all evictions will cause
kubectl drainto hang. Check.status.disruptionsAllowedfirst. - Add-on compatibility: CNI, CSI drivers, Ingress controllers, and cert-manager all have K8s version compatibility matrices — upgrade them too.