TL;DR

EKS runs the Kubernetes control plane as a managed AWS service; you operate data plane compute (often managed node groups or Karpenter), VPC networking and security groups, IAM (IRSA for pods, instance profiles for nodes), cluster add-ons, and the AWS Load Balancer Controller. Automate foundations with IaC (Terraform); keep YAML for add-ons tuned to AWS limits and upgrade windows.

Architecture & Trust Boundaries

Unlike kubeadm, you never SSH to Kubernetes masters: AWS scales and patches the apiserver plane. Your responsibility is subnets, IAM, addons, workloads, and change windows during EKS platform version upgrades.

AWS-managed control plane API server • etcd • sched • CM (multi-AZ; no shell access) Enrolled in your VPC via ENIs in control-plane subnets Authenticates workloads via IAM & IRSA OIDC issuer Your VPC (worker subnets) Managed node groups / self-managed / Fargate kubelet registers with API endpoint Pod ENIs (prefix / sec groups) • CNI (e.g. VPC CNI) NAT / inter-AZ routing for pulls & private APIs Your AWS account integrations IAM roles • Security groups • KMS • ELB / NLB ECR pulls via node role or kubelet identity ALB Controller + IAM/IRSA creates LBs

Mental split: AWS runs the apiserver/etcd stack; your VPC and IAM wire workers and cloud integrations.

Creating & Accessing Clusters

basheks-access.sh
# Typical flow after Terraform or eksctl provisioning.
aws eks update-kubeconfig --name prod-platform --region us-east-1

# Inspect platform version vs Kubernetes minor (they differ — check AWS docs).
kubectl version -o yaml
kubectl get nodes -o wide

# STS caller identity confirms which IAM principal your kubeconfig wrapper uses.
aws sts get-caller-identity

Compute: Node Groups & Alternatives

ModelYou manageOperators like it when…
EKS managed node groups (MNG)AMI family, sizing, subnets, IAM instance profile attached by EKS/LTYou want AWS to roll AMI patches with defined disruption budgets.
Self-managed Auto Scaling Groupsbootstrap script, AMI build, patching cadenceYou need custom AMIs or launch templates beyond MNG ergonomics.
Fargate profilespod sizing, subnets, selectors onlyBurst/low-ops workloads; no DaemonSet-heavy suites.
Karpenter / native CAscaling rules, interruption handling, quotasRapid elasticity and bin-packing; pair with interruption awareness.
yamlmanaged-node-labels-shape.yaml
# Terraform / eksctl equivalents set this; illustrative node labels/taints shape.
labels:
  workload: general
  topology.kubernetes.io/zone: "${AZ}" # Often set automatically from subnet.
taints:
  - key: "nvidia.com/gpu"
    value: "shared"
    effect: "NoSchedule"

IRSA — Pod IAM Without Static Keys

Map a Kubernetes ServiceAccount to an IAM role backed by your cluster OIDC issuer. Provision the IAM role and trust policy via Terraform IRSA pattern; annotate the SA in manifests or Helm.

yamlsa-irsa-annotations.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sqs-consumer
  namespace: payments
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/prod-app-sqs
    eks.amazonaws.com/sts-regional-endpoints: "true"  # Helps when STS global endpoint is flaky.

Detailed trust boundaries and SG patterns: AWS IAM & Security Groups.

EKS Add-ons & Versioning

AWS distributes tested versions of VPC CNI, CoreDNS, kube-proxy, CSI drivers, Pod Identity Agent, etc. Decide who owns Helm vs EKS-managed add-ons to avoid duplication.

Add-on domainExamplesNotes
Networking / DNSvpc-cni, kube-proxy, CoreDNSAlign versions with Kubernetes platform; plan upgrades with cluster lifecycle.
IdentityIAM Pod Identity Agent (optional alternative to IRSA)Pick one dominant pod-AWS pattern org-wide.
StorageEBS CSI, EFS CSISeparate IAM/IRSA roles per driver; KMS for encryption contexts.
Ingress / external cloudAWS Load Balancer Controller (Helm usual)Needs IRSA permissions to manage ELBv2; interacts with subnets tagged for ELB (Terraform snippet).
bashaddons-list.sh
aws eks list-addons --cluster-name prod-platform --region us-east-1
aws eks describe-addon --cluster-name prod-platform --addon-name vpc-cni --region us-east-1

AWS Load Balancer Controller

Implements Ingress (and Gateway API progress) against AWS elastic load balancing. Depends on subnets tagged per scheme (public/private internal), IRSA IAM policy, optional WAF integrations, target-type IP vs instance. See Kubernetes Service nuances in Services & Load Balancers.

yamlingress-alb-minimal.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - host: web.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80

Cluster Autoscaler Basics

Cluster Autoscaler inspects Pods stuck in Pending, consults Scheduler constraints, scales ASGs/MNGs within min/max, and drains nodes gracefully. Separate from Horizontal Pod Autoscaler (pods) and complements Karpenter for different org standards.

bashca-events.sh
# RBAC-heavy component — confirm deployment args match ASG/MNG discovery tags/cloud provider.
kubectl -n kube-system logs deploy/cluster-autoscaler --tail=200

# Pods waiting for topology / resources — CA reacts only when scheduling truly fails scale-out.
kubectl get events -A --sort-by=.lastTimestamp | tail -40
  • IAM: controller needs ec2/describe/terminate plus autoscaling per AWS docs (often IRSA).
  • Each node group exposes min/max/desired caps — CA cannot exceed AWS ASG boundaries.
  • Cluster-wide upgrades happen control-plane-first; cordon/drain node groups thoughtfully.

Helm Shape: AWS Load Balancer Controller

Below is a representative values.yaml fragment—pin chart versions in your pipeline the same way you pin Terraform providers. IRSA role ARNs must exist before helm upgrade applies.

yamlaws-lb-controller-values-fragment.yaml
clusterName: prod-platform
region: us-east-1
vpcId: vpc-0123456789abcdef0
serviceAccount:
  create: true
  name: aws-load-balancer-controller
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/prod-alb-controller
enableServiceMutatorWebhook: true
ingressClassConfig:
  default: true
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    memory: 512Mi
nodeSelector: {}
tolerations: []
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - aws-load-balancer-controller
          topologyKey: kubernetes.io/hostname
defaultTags:
  Environment: prod
  Cluster: prod-platform
logLevel: info
enableShield: false
enableWaf: false
enableWafv2: false

VPC CNI & Prefix Delegation

Prefix delegation increases IP density per ENI—critical for dense pod counts before hitting ENI quotas. Coordinate warm pool settings with application burst patterns; mis-tuned settings show up as FailedCreatePodSandBox events while Services still appear healthy at the control plane.

Setting familyWhy it matters
WARM_PREFIX_TARGETBalances pre-allocated prefixes vs cold attach latency during scale-out.
ENABLE_PREFIX_DELEGATIONMust align with subnet sizing and routable address expectations.
Security groups for PodsEach SG rule multiply affects effective throughput—pair with security group page reviews.

Fargate & Windows Footnotes

  • Fargate: DaemonSets (CNI logging, node-exporter patterns) do not exist—shift observability sidecars into Deployments or adopt Fargate-aware agents only.
  • Windows nodes: Separate MNG pools, distinct tolerations on workloads, patch cycles differ from Linux AMIs.
  • Cluster Autoscaler still needs IAM awareness for each ASG—even if workloads are ephemeral Fargate, static MNG pools may coexist.

Operational Checklists

AreaSRE checks
VPC hygieneSubnet tagging for ELBs, NAT path for private pulls, SG rules between control-plane ENIs & workers.
Admission & APIAPIServer unreachable often IAM auth or STS partition issues; webhook latency causes cascade failures.
Add-on driftIn-cluster Helm vs eksctl-managed vs EKS add-on — unify ownership.
CostsMonitor idle MNG GPU nodes, orphaned ELBs/TargetGroups across namespaces.

VPC, Subnets & Routing

Worker nodes commonly live on private subnets with NAT gateways for egress. Elastic load balancers for public Ingress may materialize either in subnets tagged kubernetes.io/role/elb or internal-only subnets tagged kubernetes.io/role/internal-elb (Terraform sample tags). Cross-AZ SG rules plus NACL pitfalls still apply—when NodePort or hostNetwork patterns appear during incidents, correlate with our Services guidance before blindly editing SG ingress.

DecisionRecommendation
Single vs multi NATPrefer one NAT GW per AZ for HA data-plane egress paths; beware cost vs blast radius trade-offs.
IPv6-enabled VPCSupports dual-stack Services and newer networking features; regression-test CNI & prefix delegation.
Restricted outboundAllow ECR, STS, APIs your IRSA workloads require; egress proxy requires trust bundle injection on nodes.
Hybrid cloud routesBGP/TGW must not overlap Pod CIDR; overlap produces silent half-open TCP sessions.
bashvalidate-subnets.sh
# Compare AWS subnet tags consumed by CCM / LB controller automation.
aws ec2 describe-subnets \
  --filters "Name=tag:kubernetes.io/cluster/prod-platform,Values=owned" \
  --query 'Subnets[*].{ID:SubnetId,AZ:AvailabilityZone,Name:Tags[?Key==`Name`].Value|[0]}'

APIServer Authorization & Access Entries

MechanismWhen it appearsOperational note
aws-auth ConfigMap (legacy)kubeadm-style IAM→.kubernetes RBAC bridgingBreaking YAML maps every engineer at once — prefer Git-reviewed changes.
EKS access entries APIIAM principal binds to Kubernetes groups / cluster-admin flagsCleaner audit trails; aligns with SCP-governed principals.
Webhook authZOpen Policy Agent / Kyverno / custom webhooksAdditive latency spikes become cluster-wide outages—watch apiserver etcd watch lag.
bashkubectl-auth-can-i.sh
# Validate effective RBAC independent of IAM wrapper (after kubeconfig merges).
kubectl auth can-i list secrets -n kube-system
kubectl auth can-i create pods --as=system:serviceaccount:default:debugger

Kubernetes & Platform Upgrades

Advance one minor Kubernetes version per maintenance window whenever possible — skip versions only when AWS publishes explicit exemption guidance. Rotate node groups progressively: bootstrap new AMI groups, cordon+d older nodes while honoring PodDisruptionBudgets, shrink old ASGs only after DaemonSets report healthy replacements.

bashpre-upgrade-health.sh
kubectl get apiservice | grep False
kubectl get validatingwebhookconfiguration,mutatingwebhookconfiguration
kubectl get pods -A | grep -vE 'Running|Completed' || true
kubectl describe nodes | grep -iE 'pressure|Kubelet' || true

NLB Shape For Service Kind LoadBalancer

Some teams prefer Kubernetes Service=LoadBalancer with NLB annotations while others standardize purely on Ingress. Keep annotations consistent cluster-wide (nlb-target-type, health probes, cross-zone).

yamlsvc-nlb-annotations-shape.yaml
apiVersion: v1
kind: Service
metadata:
  name: edge-tcp
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: tcp-proxy
  ports:
    - name: tcp5443
      port: 5443
      targetPort: 5443

Troubleshooting Matrix

SignalHypothesis pathEvidence commands
nodes NotReady floodAPIServer outage, cgroup pressure, IMDS hops after IRSA regressionsjournalctl -u kubelet via SSM/log aggregation; STS CloudTrail anomalies.
Pods wedged PendingInsufficient ASG caps, selectors, DaemonSet starvationkubectl describe pod + CA logs (Autoscaler).
Image pulls fail sporadicallyECR DENY from node role, STS throttling downstream of IRSAECR + STS metrics; widen node egress temporarily for triage.
Ingress timeouts only from internetwrong ALB subnets, TG unhealthy, MTU path issues via VPNkubectl describe ingress + AWS LB health tab.
Webhook TLS errorsExpired serving certs behind cert-manager outagekubectl get apiservice, apiserver aggregated logs filter.
💡
On-prem juxtaposition Many failure modes mirror vanilla clusters documented in On-Prem Hosting—difference is IAM + managed control plane replace bespoke etcd heroics.

Rolling Control Plane Upgrades (Shape)

bashmanaged-upgrade-rollout-shape.sh
#!/usr/bin/env bash
set -euo pipefail
CLUSTER="${CLUSTER:-prod-platform}"
REGION="${REGION:-us-east-1}"

# 1) advance control plane version after change window approval
aws eks update-cluster-version \
  --name "$CLUSTER" --region "$REGION" \
  --kubernetes-version "${TARGET_MINOR:-1.30}"

# 2) wait until ACTIVE between dependent steps — poll with backoff externally
until aws eks describe-cluster --name "$CLUSTER" --region "$REGION" \
  --query 'cluster.status' --output text | grep -qx ACTIVE; do
  echo "waiting control plane converge..."
  sleep 30
done

# 3) refresh node AMI / kubelet per nodegroup name from IaC outputs
NODEGROUP=$(aws eks list-nodegroups --cluster-name "$CLUSTER" --region "$REGION" \
  --query 'nodegroups[0]' --output text)
echo "planned rolling update targeting $NODEGROUP"

# 4) reconcile addons after nodes healthy — ensure compatibility matrix consulted
aws eks list-addons --cluster-name "$CLUSTER" --region "$REGION"

# Document manual verification gates (Ingress smoke, STS IRSA workloads) before declaring complete.

AWS Quotas & Limits To Track

  • ENI quotas per instance type interplay with Pods when prefix delegation disabled.
  • ELB quotas per region—large ingress churn during testing exhausts quotas quickly.
  • EC2 Auto Scaling API rate limits amplified by aggressive Cluster Autoscaler loops.
  • Security group rule counts including cross-referenced LB + node SG combos.
  • IAM roles per account when each micro-service demands unique IRSA role.
  • Route53 ChangeResourceRecordSets throttling mirrored by ExternalDNS logs.
  • CloudWatch Logs ingestion spikes when apiserver audit verbose.
  • EKS addon API throttling surfaced as Terraform apply retries needing backoff.
  • STS regional endpoint throughput during massive rollout events.
  • EBS BurstBalance alarms when log-heavy nodes share gp2 pools.
  • Target group Attachment limits per LB complicate multi-namespace ingress designs.
  • API Discovery publish QPS spikes around CRD churn during Helm upgrades.
  • WAF ACL association limits pairing with controllers toggling shields.
  • Cross-AZ NAT Gateway bandwidth costs mistaken as application latency regressions.
  • KMS requests per second when many pods concurrently decrypt envelopes.
  • Service Quotas uplift tickets should reference FinOps stakeholder approval paths.

Surface limits early in sizing reviews alongside Terraform-driven IaC manifests so limits become code-reviewed constants.

Gotchas

  • !VPC mismatch: wrong subnets → nodes never join or LBs provision in the wrong SG.
  • !IRSA annotation typo: subtle namespace/SA mismatch → SDK falls back to node role (least surprise permissions).
  • !Security group sprawl: default cluster SG edits can break apiserver/worker signaling — track changes carefully.
  • !Add-on duplication: two CoreDNS controllers or vpc-cni versions cause hard-to-debug iptables/IPAM errors.
  • !CA vs PDB: aggressive PodDisruptionBudgets can block scale-down for long periods.
  • !Ingress LB pending: usually IAM/IRSA/subnet tags on the LB controller pod — correlate with Events.