HA Control Plane with AWS NLB
Run an odd number of control-plane nodes (3 is typical) and front them with an internal AWS NLB on TCP 6443. Set kubeadm controlPlaneEndpoint to the NLB DNS name before the first kubeadm init. Register every control-plane EC2 instance in the NLB target group; workers and kubectl always join through the NLB, never individual node IPs.
Architecture & What NLB Replaces
A highly available Kubernetes control plane needs two stable ideas: etcd quorum (Raft across control-plane nodes) and a stable API address that does not move when one apiserver host fails. On bare metal or small EC2 labs, teams often used a floating virtual IP (kube-vip, keepalived). On AWS, an internal NLB is the usual replacement: it exposes one DNS name, health-checks each apiserver on port 6443, and forwards TCP to healthy targets.
Every client that talks to the API — kubectl, worker kubelets, in-cluster controllers, CI jobs, and additional control-plane joins — must use the same endpoint you set at init time. Changing it after certificates are minted is painful; pick the NLB hostname before the first kubeadm init.
Figure 1 — Internal NLB fronts all apiservers on :6443. Workers and kubectl use the NLB DNS name; dashed purple lines show etcd Raft replication between control-plane nodes.
AWS Objects to Create
Wire these in Terraform (or equivalent) in the same VPC as your control-plane EC2 instances. Use an internal NLB unless you have a deliberate reason to expose the Kubernetes API on the public internet.
| Resource | Key settings | Why it matters |
|---|---|---|
aws_lb | load_balancer_type = "network", internal = true | Layer-4 pass-through to apiserver; stable DNS name for kubeadm. |
aws_lb_target_group | Protocol TCP, port 6443, target type instance | One registered target per control-plane EC2 instance. |
aws_lb_target_group_attachment | Attach each aws_instance.control_plane[*] | NLB only routes to registered, healthy nodes. |
aws_lb_listener | TCP 6443 → target group | Single front door for API traffic. |
| Output | dns_name of the NLB | Becomes controlPlaneEndpoint and join commands. |
| Security group rules | 6443 on control-plane SG | NLB does not attach an SG; instance rules must permit probes and clients. |
Terraform Sketch
Place NLB resources in your cluster module after control-plane instances exist. Register targets by instance ID so replacements re-attach cleanly on the next apply.
resource "aws_lb" "control_plane" {
name = "${var.project_name}-cp-nlb"
internal = true
load_balancer_type = "network"
subnets = var.nlb_subnet_ids # private subnets in VPC
enable_cross_zone_load_balancing = true # spread targets across AZs
}
resource "aws_lb_target_group" "control_plane_api" {
name = "${var.project_name}-cp-api"
port = 6443
protocol = "TCP"
vpc_id = var.vpc_id
target_type = "instance"
health_check {
enabled = true
protocol = "TCP"
port = "6443"
}
}
resource "aws_lb_target_group_attachment" "control_plane" {
count = var.control_plane_instance_count
target_group_arn = aws_lb_target_group.control_plane_api.arn
target_id = aws_instance.control_plane[count.index].id
port = 6443
}
resource "aws_lb_listener" "control_plane_api" {
load_balancer_arn = aws_lb.control_plane.arn
port = 6443
protocol = "TCP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.control_plane_api.arn
}
}
output "control_plane_endpoint" {
# kubeadm wants host:port — strip scheme if you add https later
value = "${aws_lb.control_plane.dns_name}:6443"
}
Security Groups
After the NLB exists, confirm control-plane instances accept API traffic. A common pattern for a private cluster:
- Workers → API: allow TCP 6443 from the worker security group to the control-plane security group.
- Admins: allow 6443 from bastion or VPN CIDRs for
kubectl. - Health checks: allow 6443 from the VPC CIDR (or per-subnet CIDRs where NLB nodes evaluate targets).
- Control-plane ↔ control-plane: keep etcd (2379-2380), kubelet (10250), and other control-plane ports open within the control-plane SG (unchanged from single-node setups).
kubeadm: controlPlaneEndpoint
Set the endpoint to the NLB DNS name before the first init. kubeadm bakes this hostname into API server certificates; all joins must match.
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: my-cluster
# NLB DNS from terraform output — not a node IP or old floating VIP
controlPlaneEndpoint: "internal-abc123.elb.us-east-1.amazonaws.com:6443"
networking:
podSubnet: "10.244.0.0/16" # must match your CNI (e.g. Cilium kubernetes IPAM)
serviceSubnet: "10.96.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
kubeletExtraArgs:
cgroup-driver: systemd
On the first control-plane host, copy this file (or equivalent) and initialize:
sudo kubeadm init --config cluster-config.yaml --upload-certs
mkdir -p "$HOME/.kube"
sudo cp /etc/kubernetes/admin.conf "$HOME/.kube/config"
sudo chown "$(id -u):$(id -g)" "$HOME/.kube/config"
# install CNI (Cilium CLI or Helm) before expecting nodes Ready
Bootstrap Order
Follow dependency order so the NLB has at least one healthy target before you rely on it for joins. Target groups stay unhealthy until an apiserver listens on 6443.
Figure 2 — Recommended bootstrap sequence. NLB can exist before init, but targets become healthy only after the first apiserver listens on 6443.
| Step | Where | Action |
|---|---|---|
| 1 | Terraform | Apply VPC, control-plane EC2, internal NLB, target group, listener, SG rules. |
| 2 | First CP | kubeadm init --config cluster-config.yaml --upload-certs using NLB DNS in controlPlaneEndpoint. |
| 3 | First CP | Install CNI; verify kubectl get nodes. |
| 4 | Other CPs | kubeadm join <nlb-dns>:6443 --control-plane ... with token, CA hash, and --certificate-key. |
| 5 | Workers | kubeadm join <nlb-dns>:6443 ... — same endpoint as init. |
| 6 | Validation | NLB target health, etcd members, controlled CP failure test (below). |
Generate join commands from the first control-plane after CNI is healthy:
# worker join (print from control-plane-0)
kubeadm token create --print-join-command
# control-plane join — use NLB host, not node IP
sudo kubeadm join internal-abc123.elb.us-east-1.amazonaws.com:6443 \
--control-plane \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--certificate-key <key-from-upload-certs>
controlPlaneEndpoint on a live cluster requires updating API server certificates, kubeconfigs, and static pod configs. For learning, prefer a fresh cluster with the NLB endpoint set from day one.HA Testing
Prove the NLB path, not just that three apiservers exist. Baseline first, then fail one control-plane at a time.
Baseline checks
# kubeconfig should point at NLB (or use --server override once)
kubectl get nodes
kubectl -n kube-system get pods
# on a control-plane node — etcd membership (stacked etcd with kubeadm)
sudo ETCDCTL_API=3 etcdctl member list \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--cert=/etc/kubernetes/pki/etcd/server.crt \
--key=/etc/kubernetes/pki/etcd/server.key
# AWS: target group should show healthy for each CP (CLI/console)
Controlled failure
# Option A: stop kubelet on one CP (software failure)
ssh control-plane-1 'sudo systemctl stop kubelet'
# Option B: stop EC2 instance (host failure) — in AWS console or CLI
# From your laptop — keep using NLB endpoint
kubectl get nodes --request-timeout=10s
kubectl -n kube-system get pods -l tier=control-plane
# Expect: one NLB target unhealthy; API still answers if 2-of-3 etcd quorum holds
# Restore — start instance/kubelet; only re-join if node was removed from cluster
ssh control-plane-1 'sudo systemctl start kubelet'
For a 3-node control plane, etcd tolerates one failed member. Losing two simultaneously loses quorum — existing Pods may keep running, but the API stops accepting reliable writes.
Gotchas
| Topic | Guidance |
|---|---|
| Cross-zone | Enable cross-zone load balancing when control-plane instances span multiple AZs; align subnets with your HA layout. |
| Sticky sessions | Not required for TCP 6443 pass-through. |
| Public NLB | Avoid unless you explicitly want the Kubernetes API on the internet; prefer VPN/bastion + internal NLB. |
| DNS TTL | NLB DNS is AWS-managed; optional Route53 alias for a friendly name — include that name in cert SANs if you add custom certs. |
| Single subnet labs | HA across AZs needs CPs and NLB subnets in more than one AZ; one subnet limits blast-radius benefits. |
| Health check timing | During first init, only the initialized node should become healthy; others register after join completes. |
Troubleshooting
| Symptom | Likely cause | What to check |
|---|---|---|
| All NLB targets unhealthy | API not listening on 6443 yet, or SG blocks probes | ss -lntp | grep 6443 on CP; SG allows VPC/subnet to 6443; kubeadm init completed. |
kubeadm join timeout to NLB | Wrong endpoint, SG, or route | DNS resolves; nc -zv <nlb-dns> 6443 from joining host; worker SG allowed on CP SG. |
| TLS / x509 errors on join | Endpoint hostname mismatch | controlPlaneEndpoint at init must match join URL; compare apiserver cert SANs. |
kubectl works via node IP but not NLB | kubeconfig still points at node | Update server: in admin.conf / kubeconfig to NLB URL. |
| API flaps during CP failure | Only one CP or lost etcd quorum | Need 3+ CPs for one failure; check etcdctl endpoint health. |
| Workers NotReady after CP loss | API unreachable or CNI issue | kubelet logs; confirm workers join via NLB, not stale VIP IP. |
See also
- Nodes and Upgrades — Cluster Maintenance Knowledge
- Kubernetes and Upgrades — Kubernetes Cluster Upgrades
- kubeadm and HA Control Plane — On-Prem & kubeadm Clusters
- Terraform and AWS — Terraform AWS / EKS Templates
- Terraform and Azure — Terraform Azure / AKS Templates
- Terraform and IaC — Terraform From Scratch