TL;DR

Run an odd number of control-plane nodes (3 is typical) and front them with an internal AWS NLB on TCP 6443. Set kubeadm controlPlaneEndpoint to the NLB DNS name before the first kubeadm init. Register every control-plane EC2 instance in the NLB target group; workers and kubectl always join through the NLB, never individual node IPs.

Architecture & What NLB Replaces

A highly available Kubernetes control plane needs two stable ideas: etcd quorum (Raft across control-plane nodes) and a stable API address that does not move when one apiserver host fails. On bare metal or small EC2 labs, teams often used a floating virtual IP (kube-vip, keepalived). On AWS, an internal NLB is the usual replacement: it exposes one DNS name, health-checks each apiserver on port 6443, and forwards TCP to healthy targets.

Every client that talks to the API — kubectl, worker kubelets, in-cluster controllers, CI jobs, and additional control-plane joins — must use the same endpoint you set at init time. Changing it after certificates are minted is painful; pick the NLB hostname before the first kubeadm init.

CLIENTS kubectl kubelet Internal AWS NLB TCP :6443 · cross-zone enabled k8s-api.internal.example.com WORKERS worker-0 worker-1 CONTROL PLANE (odd count — 3 shown) control-plane-0 apiserver · scheduler controller-manager etcd (leader) control-plane-1 apiserver · scheduler controller-manager etcd (follower) control-plane-2 apiserver · scheduler controller-manager etcd (follower)

Figure 1 — Internal NLB fronts all apiservers on :6443. Workers and kubectl use the NLB DNS name; dashed purple lines show etcd Raft replication between control-plane nodes.

💡
NLB has no security group — traffic is allowed or denied on the target instances. Open TCP 6443 on the control-plane security group from worker SGs, admin CIDRs, and the VPC CIDR (or NLB subnet ranges) so health checks and forwarded connections succeed.

AWS Objects to Create

Wire these in Terraform (or equivalent) in the same VPC as your control-plane EC2 instances. Use an internal NLB unless you have a deliberate reason to expose the Kubernetes API on the public internet.

ResourceKey settingsWhy it matters
aws_lbload_balancer_type = "network", internal = trueLayer-4 pass-through to apiserver; stable DNS name for kubeadm.
aws_lb_target_groupProtocol TCP, port 6443, target type instanceOne registered target per control-plane EC2 instance.
aws_lb_target_group_attachmentAttach each aws_instance.control_plane[*]NLB only routes to registered, healthy nodes.
aws_lb_listenerTCP 6443 → target groupSingle front door for API traffic.
Outputdns_name of the NLBBecomes controlPlaneEndpoint and join commands.
Security group rules6443 on control-plane SGNLB does not attach an SG; instance rules must permit probes and clients.

Terraform Sketch

Place NLB resources in your cluster module after control-plane instances exist. Register targets by instance ID so replacements re-attach cleanly on the next apply.

hclmodules/cluster/nlb.tf (illustrative)
resource "aws_lb" "control_plane" {
  name               = "${var.project_name}-cp-nlb"
  internal           = true
  load_balancer_type = "network"
  subnets            = var.nlb_subnet_ids # private subnets in VPC

  enable_cross_zone_load_balancing = true # spread targets across AZs
}

resource "aws_lb_target_group" "control_plane_api" {
  name        = "${var.project_name}-cp-api"
  port        = 6443
  protocol    = "TCP"
  vpc_id      = var.vpc_id
  target_type = "instance"

  health_check {
    enabled  = true
    protocol = "TCP"
    port     = "6443"
  }
}

resource "aws_lb_target_group_attachment" "control_plane" {
  count            = var.control_plane_instance_count
  target_group_arn = aws_lb_target_group.control_plane_api.arn
  target_id        = aws_instance.control_plane[count.index].id
  port             = 6443
}

resource "aws_lb_listener" "control_plane_api" {
  load_balancer_arn = aws_lb.control_plane.arn
  port              = 6443
  protocol          = "TCP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.control_plane_api.arn
  }
}

output "control_plane_endpoint" {
  # kubeadm wants host:port — strip scheme if you add https later
  value = "${aws_lb.control_plane.dns_name}:6443"
}

Security Groups

After the NLB exists, confirm control-plane instances accept API traffic. A common pattern for a private cluster:

  • Workers → API: allow TCP 6443 from the worker security group to the control-plane security group.
  • Admins: allow 6443 from bastion or VPN CIDRs for kubectl.
  • Health checks: allow 6443 from the VPC CIDR (or per-subnet CIDRs where NLB nodes evaluate targets).
  • Control-plane ↔ control-plane: keep etcd (2379-2380), kubelet (10250), and other control-plane ports open within the control-plane SG (unchanged from single-node setups).

kubeadm: controlPlaneEndpoint

Set the endpoint to the NLB DNS name before the first init. kubeadm bakes this hostname into API server certificates; all joins must match.

yamlcluster-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
clusterName: my-cluster
# NLB DNS from terraform output — not a node IP or old floating VIP
controlPlaneEndpoint: "internal-abc123.elb.us-east-1.amazonaws.com:6443"
networking:
  podSubnet: "10.244.0.0/16"      # must match your CNI (e.g. Cilium kubernetes IPAM)
  serviceSubnet: "10.96.0.0/12"
---
apiVersion: kubeadm.k8s.io/v1beta4
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  kubeletExtraArgs:
    cgroup-driver: systemd

On the first control-plane host, copy this file (or equivalent) and initialize:

bashfirst control-plane node
sudo kubeadm init --config cluster-config.yaml --upload-certs

mkdir -p "$HOME/.kube"
sudo cp /etc/kubernetes/admin.conf "$HOME/.kube/config"
sudo chown "$(id -u):$(id -g)" "$HOME/.kube/config"

# install CNI (Cilium CLI or Helm) before expecting nodes Ready

Bootstrap Order

Follow dependency order so the NLB has at least one healthy target before you rely on it for joins. Target groups stay unhealthy until an apiserver listens on 6443.

1. Terraform EC2 + NLB 2. kubeadm init first CP 3. CNI Cilium, etc. 4. Join CPs --control-plane 5. Join workers via NLB :6443 6. Test HA

Figure 2 — Recommended bootstrap sequence. NLB can exist before init, but targets become healthy only after the first apiserver listens on 6443.

StepWhereAction
1TerraformApply VPC, control-plane EC2, internal NLB, target group, listener, SG rules.
2First CPkubeadm init --config cluster-config.yaml --upload-certs using NLB DNS in controlPlaneEndpoint.
3First CPInstall CNI; verify kubectl get nodes.
4Other CPskubeadm join <nlb-dns>:6443 --control-plane ... with token, CA hash, and --certificate-key.
5Workerskubeadm join <nlb-dns>:6443 ... — same endpoint as init.
6ValidationNLB target health, etcd members, controlled CP failure test (below).

Generate join commands from the first control-plane after CNI is healthy:

bashjoin additional control-plane
# worker join (print from control-plane-0)
kubeadm token create --print-join-command

# control-plane join — use NLB host, not node IP
sudo kubeadm join internal-abc123.elb.us-east-1.amazonaws.com:6443 \
  --control-plane \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash> \
  --certificate-key <key-from-upload-certs>
!
Already initialized with a VIP or node IP? Migrating controlPlaneEndpoint on a live cluster requires updating API server certificates, kubeconfigs, and static pod configs. For learning, prefer a fresh cluster with the NLB endpoint set from day one.

HA Testing

Prove the NLB path, not just that three apiservers exist. Baseline first, then fail one control-plane at a time.

Baseline checks

bashbaseline
# kubeconfig should point at NLB (or use --server override once)
kubectl get nodes
kubectl -n kube-system get pods

# on a control-plane node — etcd membership (stacked etcd with kubeadm)
sudo ETCDCTL_API=3 etcdctl member list \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# AWS: target group should show healthy for each CP (CLI/console)

Controlled failure

bashfail one control-plane
# Option A: stop kubelet on one CP (software failure)
ssh control-plane-1 'sudo systemctl stop kubelet'

# Option B: stop EC2 instance (host failure) — in AWS console or CLI

# From your laptop — keep using NLB endpoint
kubectl get nodes --request-timeout=10s
kubectl -n kube-system get pods -l tier=control-plane

# Expect: one NLB target unhealthy; API still answers if 2-of-3 etcd quorum holds

# Restore — start instance/kubelet; only re-join if node was removed from cluster
ssh control-plane-1 'sudo systemctl start kubelet'

For a 3-node control plane, etcd tolerates one failed member. Losing two simultaneously loses quorum — existing Pods may keep running, but the API stops accepting reliable writes.

Gotchas

TopicGuidance
Cross-zoneEnable cross-zone load balancing when control-plane instances span multiple AZs; align subnets with your HA layout.
Sticky sessionsNot required for TCP 6443 pass-through.
Public NLBAvoid unless you explicitly want the Kubernetes API on the internet; prefer VPN/bastion + internal NLB.
DNS TTLNLB DNS is AWS-managed; optional Route53 alias for a friendly name — include that name in cert SANs if you add custom certs.
Single subnet labsHA across AZs needs CPs and NLB subnets in more than one AZ; one subnet limits blast-radius benefits.
Health check timingDuring first init, only the initialized node should become healthy; others register after join completes.

Troubleshooting

SymptomLikely causeWhat to check
All NLB targets unhealthyAPI not listening on 6443 yet, or SG blocks probesss -lntp | grep 6443 on CP; SG allows VPC/subnet to 6443; kubeadm init completed.
kubeadm join timeout to NLBWrong endpoint, SG, or routeDNS resolves; nc -zv <nlb-dns> 6443 from joining host; worker SG allowed on CP SG.
TLS / x509 errors on joinEndpoint hostname mismatchcontrolPlaneEndpoint at init must match join URL; compare apiserver cert SANs.
kubectl works via node IP but not NLBkubeconfig still points at nodeUpdate server: in admin.conf / kubeconfig to NLB URL.
API flaps during CP failureOnly one CP or lost etcd quorumNeed 3+ CPs for one failure; check etcdctl endpoint health.
Workers NotReady after CP lossAPI unreachable or CNI issuekubelet logs; confirm workers join via NLB, not stale VIP IP.

See also