HA Control Plane with AWS NLB — K8s SRE Reference

TL;DR

Run an odd number of control-plane nodes (3 is typical) and front them with an internal AWS NLB on TCP 6443. Set kubeadm controlPlaneEndpoint to the NLB DNS name before the first kubeadm init. Register every control-plane EC2 instance in the NLB target group; workers and kubectl always join through the NLB, never individual node IPs.

Architecture & What NLB Replaces

A highly available Kubernetes control plane needs two stable ideas: etcd quorum (Raft across control-plane nodes) and a stable API address that does not move when one apiserver host fails. On bare metal or small EC2 labs, teams often used a floating virtual IP (kube-vip, keepalived). On AWS, an internal NLB is the usual replacement: it exposes one DNS name, health-checks each apiserver on port 6443, and forwards TCP to healthy targets.

Every client that talks to the API — kubectl, worker kubelets, in-cluster controllers, CI jobs, and additional control-plane joins — must use the same endpoint you set at init time. Changing it after certificates are minted is painful; pick the NLB hostname before the first kubeadm init.

Figure 1 — Internal NLB fronts all apiservers on :6443. Workers and kubectl use the NLB DNS name; dashed purple lines show etcd Raft replication between control-plane nodes.

💡

NLB has no security group — traffic is allowed or denied on the target instances. Open TCP 6443 on the control-plane security group from worker SGs, admin CIDRs, and the VPC CIDR (or NLB subnet ranges) so health checks and forwarded connections succeed.

AWS Objects to Create

Wire these in Terraform (or equivalent) in the same VPC as your control-plane EC2 instances. Use an internal NLB unless you have a deliberate reason to expose the Kubernetes API on the public internet.

Resource	Key settings	Why it matters
`aws_lb`	`load_balancer_type = "network"`, `internal = true`	Layer-4 pass-through to apiserver; stable DNS name for kubeadm.
`aws_lb_target_group`	Protocol TCP, port 6443, target type `instance`	One registered target per control-plane EC2 instance.
`aws_lb_target_group_attachment`	Attach each `aws_instance.control_plane[*]`	NLB only routes to registered, healthy nodes.
`aws_lb_listener`	TCP 6443 → target group	Single front door for API traffic.
Output	`dns_name` of the NLB	Becomes `controlPlaneEndpoint` and join commands.
Security group rules	6443 on control-plane SG	NLB does not attach an SG; instance rules must permit probes and clients.

Terraform Sketch

Place NLB resources in your cluster module after control-plane instances exist. Register targets by instance ID so replacements re-attach cleanly on the next apply.

hclmodules/cluster/nlb.tf (illustrative)

resource "aws_lb" "control_plane" {
  name               = "${var.project_name}-cp-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.nlb_subnet_ids # private subnets in VPC

  enable_cross_zone_load_balancing = true # spread targets across AZs
}

resource "aws_lb_target_group" "control_plane_api" {
  name        = "${var.project_name}-cp-api"
  port        = 6443
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    enabled  = true
    protocol = "TCP"
    port     = "6443"
  }
}

resource "aws_lb_target_group_attachment" "control_plane" {
  count            = var.control_plane_instance_count
  target_group_arn = aws_lb_target_group.control_plane_api.arn
  target_id        = aws_instance.control_plane[count.index].id
  port             = 6443
}

resource "aws_lb_listener" "control_plane_api" {
  load_balancer_arn = aws_lb.control_plane.arn
  port              = 6443
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.control_plane_api.arn
  }
}

output "control_plane_endpoint" {
  # kubeadm wants host:port — strip scheme if you add https later
  value = "${aws_lb.control_plane.dns_name}:6443"
}

Security Groups

After the NLB exists, confirm control-plane instances accept API traffic. A common pattern for a private cluster:

Workers → API: allow TCP 6443 from the worker security group to the control-plane security group.
Admins: allow 6443 from bastion or VPN CIDRs for kubectl.
Health checks: allow 6443 from the VPC CIDR (or per-subnet CIDRs where NLB nodes evaluate targets).
Control-plane ↔ control-plane: keep etcd (2379-2380), kubelet (10250), and other control-plane ports open within the control-plane SG (unchanged from single-node setups).

kubeadm: controlPlaneEndpoint

Set the endpoint to the NLB DNS name before the first init. kubeadm bakes this hostname into API server certificates; all joins must match.

yamlcluster-config.yaml

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: my-cluster
# NLB DNS from terraform output — not a node IP or old floating VIP
controlPlaneEndpoint: "internal-abc123.elb.us-east-1.amazonaws.com:6443"
networking:
  podSubnet: "10.244.0.0/16"      # must match your CNI (e.g. Cilium kubernetes IPAM)
  serviceSubnet: "10.96.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  kubeletExtraArgs:
    cgroup-driver: systemd

On the first control-plane host, copy this file (or equivalent) and initialize:

bashfirst control-plane node

sudo kubeadm init --config cluster-config.yaml --upload-certs

mkdir -p "$HOME/.kube"
sudo cp /etc/kubernetes/admin.conf "$HOME/.kube/config"
sudo chown "$(id -u):$(id -g)" "$HOME/.kube/config"

# install CNI (Cilium CLI or Helm) before expecting nodes Ready

Bootstrap Order

Follow dependency order so the NLB has at least one healthy target before you rely on it for joins. Target groups stay unhealthy until an apiserver listens on 6443.

Figure 2 — Recommended bootstrap sequence. NLB can exist before init, but targets become healthy only after the first apiserver listens on 6443.

Step	Where	Action
1	Terraform	Apply VPC, control-plane EC2, internal NLB, target group, listener, SG rules.
2	First CP	`kubeadm init --config cluster-config.yaml --upload-certs` using NLB DNS in `controlPlaneEndpoint`.
3	First CP	Install CNI; verify `kubectl get nodes`.
4	Other CPs	`kubeadm join <nlb-dns>:6443 --control-plane ...` with token, CA hash, and `--certificate-key`.
5	Workers	`kubeadm join <nlb-dns>:6443 ...` — same endpoint as init.
6	Validation	NLB target health, etcd members, controlled CP failure test (below).

Generate join commands from the first control-plane after CNI is healthy:

bashjoin additional control-plane

# worker join (print from control-plane-0)
kubeadm token create --print-join-command

# control-plane join — use NLB host, not node IP
sudo kubeadm join internal-abc123.elb.us-east-1.amazonaws.com:6443 \
  --control-plane \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --certificate-key <key-from-upload-certs>

Already initialized with a VIP or node IP? Migrating controlPlaneEndpoint on a live cluster requires updating API server certificates, kubeconfigs, and static pod configs. For learning, prefer a fresh cluster with the NLB endpoint set from day one.

HA Testing

Prove the NLB path, not just that three apiservers exist. Baseline first, then fail one control-plane at a time.

Baseline checks

bashbaseline

# kubeconfig should point at NLB (or use --server override once)
kubectl get nodes
kubectl -n kube-system get pods

# on a control-plane node — etcd membership (stacked etcd with kubeadm)
sudo ETCDCTL_API=3 etcdctl member list \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# AWS: target group should show healthy for each CP (CLI/console)

Controlled failure

bashfail one control-plane

# Option A: stop kubelet on one CP (software failure)
ssh control-plane-1 'sudo systemctl stop kubelet'

# Option B: stop EC2 instance (host failure) — in AWS console or CLI

# From your laptop — keep using NLB endpoint
kubectl get nodes --request-timeout=10s
kubectl -n kube-system get pods -l tier=control-plane

# Expect: one NLB target unhealthy; API still answers if 2-of-3 etcd quorum holds

# Restore — start instance/kubelet; only re-join if node was removed from cluster
ssh control-plane-1 'sudo systemctl start kubelet'

For a 3-node control plane, etcd tolerates one failed member. Losing two simultaneously loses quorum — existing Pods may keep running, but the API stops accepting reliable writes.

Gotchas

Topic	Guidance
Cross-zone	Enable cross-zone load balancing when control-plane instances span multiple AZs; align subnets with your HA layout.
Sticky sessions	Not required for TCP 6443 pass-through.
Public NLB	Avoid unless you explicitly want the Kubernetes API on the internet; prefer VPN/bastion + internal NLB.
DNS TTL	NLB DNS is AWS-managed; optional Route53 alias for a friendly name — include that name in cert SANs if you add custom certs.
Single subnet labs	HA across AZs needs CPs and NLB subnets in more than one AZ; one subnet limits blast-radius benefits.
Health check timing	During first init, only the initialized node should become healthy; others register after join completes.

Troubleshooting

Symptom	Likely cause	What to check
All NLB targets unhealthy	API not listening on 6443 yet, or SG blocks probes	`ss -lntp \| grep 6443` on CP; SG allows VPC/subnet to 6443; kubeadm init completed.
`kubeadm join` timeout to NLB	Wrong endpoint, SG, or route	DNS resolves; `nc -zv <nlb-dns> 6443` from joining host; worker SG allowed on CP SG.
TLS / x509 errors on join	Endpoint hostname mismatch	`controlPlaneEndpoint` at init must match join URL; compare apiserver cert SANs.
`kubectl` works via node IP but not NLB	kubeconfig still points at node	Update `server:` in admin.conf / kubeconfig to NLB URL.
API flaps during CP failure	Only one CP or lost etcd quorum	Need 3+ CPs for one failure; check `etcdctl endpoint health`.
Workers NotReady after CP loss	API unreachable or CNI issue	kubelet logs; confirm workers join via NLB, not stale VIP IP.