EKS Deep Dive — K8s SRE Reference

TL;DR

EKS runs the Kubernetes control plane as a managed AWS service; you operate data plane compute (often managed node groups or Karpenter), VPC networking and security groups, IAM (IRSA for pods, instance profiles for nodes), cluster add-ons, and the AWS Load Balancer Controller. Automate foundations with IaC (Terraform); keep YAML for add-ons tuned to AWS limits and upgrade windows.

Architecture & Trust Boundaries

Unlike kubeadm, you never SSH to Kubernetes masters: AWS scales and patches the apiserver plane. Your responsibility is subnets, IAM, addons, workloads, and change windows during EKS platform version upgrades.

Mental split: AWS runs the apiserver/etcd stack; your VPC and IAM wire workers and cloud integrations.

Creating & Accessing Clusters

basheks-access.sh

# Typical flow after Terraform or eksctl provisioning.
aws eks update-kubeconfig --name prod-platform --region us-east-1

# Inspect platform version vs Kubernetes minor (they differ — check AWS docs).
kubectl version -o yaml
kubectl get nodes -o wide

# STS caller identity confirms which IAM principal your kubeconfig wrapper uses.
aws sts get-caller-identity

Compute: Node Groups & Alternatives

Model	You manage	Operators like it when…
EKS managed node groups (MNG)	AMI family, sizing, subnets, IAM instance profile attached by EKS/LT	You want AWS to roll AMI patches with defined disruption budgets.
Self-managed Auto Scaling Groups	bootstrap script, AMI build, patching cadence	You need custom AMIs or launch templates beyond MNG ergonomics.
Fargate profiles	pod sizing, subnets, selectors only	Burst/low-ops workloads; no DaemonSet-heavy suites.
Karpenter / native CA	scaling rules, interruption handling, quotas	Rapid elasticity and bin-packing; pair with interruption awareness.

yamlmanaged-node-labels-shape.yaml

# Terraform / eksctl equivalents set this; illustrative node labels/taints shape.
labels:
  workload: general
  topology.kubernetes.io/zone: "${AZ}" # Often set automatically from subnet.
taints:
  - key: "nvidia.com/gpu"
    value: "shared"
    effect: "NoSchedule"

IRSA — Pod IAM Without Static Keys

Map a Kubernetes ServiceAccount to an IAM role backed by your cluster OIDC issuer. Provision the IAM role and trust policy via Terraform IRSA pattern; annotate the SA in manifests or Helm.

yamlsa-irsa-annotations.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-sqs-consumer
  namespace: payments
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/prod-app-sqs
    eks.amazonaws.com/sts-regional-endpoints: "true"  # Helps when STS global endpoint is flaky.

Detailed trust boundaries and SG patterns: AWS IAM & Security Groups.

EKS Add-ons & Versioning

AWS distributes tested versions of VPC CNI, CoreDNS, kube-proxy, CSI drivers, Pod Identity Agent, etc. Decide who owns Helm vs EKS-managed add-ons to avoid duplication.

Add-on domain	Examples	Notes
Networking / DNS	vpc-cni, kube-proxy, CoreDNS	Align versions with Kubernetes platform; plan upgrades with cluster lifecycle.
Identity	IAM Pod Identity Agent (optional alternative to IRSA)	Pick one dominant pod-AWS pattern org-wide.
Storage	EBS CSI, EFS CSI	Separate IAM/IRSA roles per driver; KMS for encryption contexts.
Ingress / external cloud	AWS Load Balancer Controller (Helm usual)	Needs IRSA permissions to manage ELBv2; interacts with subnets tagged for ELB (Terraform snippet).

bashaddons-list.sh

aws eks list-addons --cluster-name prod-platform --region us-east-1
aws eks describe-addon --cluster-name prod-platform --addon-name vpc-cni --region us-east-1

AWS Load Balancer Controller

Implements Ingress (and Gateway API progress) against AWS elastic load balancing. Depends on subnets tagged per scheme (public/private internal), IRSA IAM policy, optional WAF integrations, target-type IP vs instance. See Kubernetes Service nuances in Services & Load Balancers.

yamlingress-alb-minimal.yaml

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web
  annotations:
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
spec:
  ingressClassName: alb
  rules:
    - host: web.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web
                port:
                  number: 80

Cluster Autoscaler Basics

Cluster Autoscaler inspects Pods stuck in Pending, consults Scheduler constraints, scales ASGs/MNGs within min/max, and drains nodes gracefully. Separate from Horizontal Pod Autoscaler (pods) and complements Karpenter for different org standards.

bashca-events.sh

# RBAC-heavy component — confirm deployment args match ASG/MNG discovery tags/cloud provider.
kubectl -n kube-system logs deploy/cluster-autoscaler --tail=200

# Pods waiting for topology / resources — CA reacts only when scheduling truly fails scale-out.
kubectl get events -A --sort-by=.lastTimestamp | tail -40

IAM: controller needs ec2/describe/terminate plus autoscaling per AWS docs (often IRSA).
Each node group exposes min/max/desired caps — CA cannot exceed AWS ASG boundaries.
Cluster-wide upgrades happen control-plane-first; cordon/drain node groups thoughtfully.

Helm Shape: AWS Load Balancer Controller

Below is a representative values.yaml fragment—pin chart versions in your pipeline the same way you pin Terraform providers. IRSA role ARNs must exist before helm upgrade applies.

yamlaws-lb-controller-values-fragment.yaml

clusterName: prod-platform
region: us-east-1
vpcId: vpc-0123456789abcdef0
serviceAccount:
  create: true
  name: aws-load-balancer-controller
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/prod-alb-controller
enableServiceMutatorWebhook: true
ingressClassConfig:
  default: true
resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    memory: 512Mi
nodeSelector: {}
tolerations: []
affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                  - aws-load-balancer-controller
          topologyKey: kubernetes.io/hostname
defaultTags:
  Environment: prod
  Cluster: prod-platform
logLevel: info
enableShield: false
enableWaf: false
enableWafv2: false

VPC CNI & Prefix Delegation

Prefix delegation increases IP density per ENI—critical for dense pod counts before hitting ENI quotas. Coordinate warm pool settings with application burst patterns; mis-tuned settings show up as FailedCreatePodSandBox events while Services still appear healthy at the control plane.

Setting family	Why it matters
`WARM_PREFIX_TARGET`	Balances pre-allocated prefixes vs cold attach latency during scale-out.
`ENABLE_PREFIX_DELEGATION`	Must align with subnet sizing and routable address expectations.
Security groups for Pods	Each SG rule multiply affects effective throughput—pair with security group page reviews.

Fargate & Windows Footnotes

Fargate: DaemonSets (CNI logging, node-exporter patterns) do not exist—shift observability sidecars into Deployments or adopt Fargate-aware agents only.
Windows nodes: Separate MNG pools, distinct tolerations on workloads, patch cycles differ from Linux AMIs.
Cluster Autoscaler still needs IAM awareness for each ASG—even if workloads are ephemeral Fargate, static MNG pools may coexist.

Operational Checklists

Area	SRE checks
VPC hygiene	Subnet tagging for ELBs, NAT path for private pulls, SG rules between control-plane ENIs & workers.
Admission & API	APIServer unreachable often IAM auth or STS partition issues; webhook latency causes cascade failures.
Add-on drift	In-cluster Helm vs eksctl-managed vs EKS add-on — unify ownership.
Costs	Monitor idle MNG GPU nodes, orphaned ELBs/TargetGroups across namespaces.

VPC, Subnets & Routing

Worker nodes commonly live on private subnets with NAT gateways for egress. Elastic load balancers for public Ingress may materialize either in subnets tagged kubernetes.io/role/elb or internal-only subnets tagged kubernetes.io/role/internal-elb (Terraform sample tags). Cross-AZ SG rules plus NACL pitfalls still apply—when NodePort or hostNetwork patterns appear during incidents, correlate with our Services guidance before blindly editing SG ingress.

Decision	Recommendation
Single vs multi NAT	Prefer one NAT GW per AZ for HA data-plane egress paths; beware cost vs blast radius trade-offs.
IPv6-enabled VPC	Supports dual-stack Services and newer networking features; regression-test CNI & prefix delegation.
Restricted outbound	Allow ECR, STS, APIs your IRSA workloads require; egress proxy requires trust bundle injection on nodes.
Hybrid cloud routes	BGP/TGW must not overlap Pod CIDR; overlap produces silent half-open TCP sessions.

bashvalidate-subnets.sh

# Compare AWS subnet tags consumed by CCM / LB controller automation.
aws ec2 describe-subnets \
  --filters "Name=tag:kubernetes.io/cluster/prod-platform,Values=owned" \
  --query 'Subnets[*].{ID:SubnetId,AZ:AvailabilityZone,Name:Tags[?Key==`Name`].Value|[0]}'

APIServer Authorization & Access Entries

Mechanism	When it appears	Operational note
aws-auth ConfigMap (legacy)	kubeadm-style IAM→.kubernetes RBAC bridging	Breaking YAML maps every engineer at once — prefer Git-reviewed changes.
EKS access entries API	IAM principal binds to Kubernetes groups / cluster-admin flags	Cleaner audit trails; aligns with SCP-governed principals.
Webhook authZ	Open Policy Agent / Kyverno / custom webhooks	Additive latency spikes become cluster-wide outages—watch apiserver etcd watch lag.

bashkubectl-auth-can-i.sh

# Validate effective RBAC independent of IAM wrapper (after kubeconfig merges).
kubectl auth can-i list secrets -n kube-system
kubectl auth can-i create pods --as=system:serviceaccount:default:debugger

Kubernetes & Platform Upgrades

Advance one minor Kubernetes version per maintenance window whenever possible — skip versions only when AWS publishes explicit exemption guidance. Rotate node groups progressively: bootstrap new AMI groups, cordon+d older nodes while honoring PodDisruptionBudgets, shrink old ASGs only after DaemonSets report healthy replacements.

bashpre-upgrade-health.sh

kubectl get apiservice | grep False
kubectl get validatingwebhookconfiguration,mutatingwebhookconfiguration
kubectl get pods -A | grep -vE 'Running|Completed' || true
kubectl describe nodes | grep -iE 'pressure|Kubelet' || true

NLB Shape For Service Kind LoadBalancer

Some teams prefer Kubernetes Service=LoadBalancer with NLB annotations while others standardize purely on Ingress. Keep annotations consistent cluster-wide (nlb-target-type, health probes, cross-zone).

yamlsvc-nlb-annotations-shape.yaml

apiVersion: v1
kind: Service
metadata:
  name: edge-tcp
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internet-facing"
spec:
  type: LoadBalancer
  selector:
    app: tcp-proxy
  ports:
    - name: tcp5443
      port: 5443
      targetPort: 5443

Troubleshooting Matrix

Signal	Hypothesis path	Evidence commands
nodes NotReady flood	APIServer outage, cgroup pressure, IMDS hops after IRSA regressions	`journalctl -u kubelet` via SSM/log aggregation; STS CloudTrail anomalies.
Pods wedged Pending	Insufficient ASG caps, selectors, DaemonSet starvation	`kubectl describe pod` + CA logs (Autoscaler).
Image pulls fail sporadically	ECR DENY from node role, STS throttling downstream of IRSA	ECR + STS metrics; widen node egress temporarily for triage.
Ingress timeouts only from internet	wrong ALB subnets, TG unhealthy, MTU path issues via VPN	`kubectl describe ingress` + AWS LB health tab.
Webhook TLS errors	Expired serving certs behind cert-manager outage	`kubectl get apiservice`, apiserver aggregated logs filter.

💡

On-prem juxtaposition Many failure modes mirror vanilla clusters documented in On-Prem Hosting—difference is IAM + managed control plane replace bespoke etcd heroics.

Rolling Control Plane Upgrades (Shape)

bashmanaged-upgrade-rollout-shape.sh

#!/usr/bin/env bash
set -euo pipefail
CLUSTER="${CLUSTER:-prod-platform}"
REGION="${REGION:-us-east-1}"

# 1) advance control plane version after change window approval
aws eks update-cluster-version \
  --name "$CLUSTER" --region "$REGION" \
  --kubernetes-version "${TARGET_MINOR:-1.30}"

# 2) wait until ACTIVE between dependent steps — poll with backoff externally
until aws eks describe-cluster --name "$CLUSTER" --region "$REGION" \
  --query 'cluster.status' --output text | grep -qx ACTIVE; do
  echo "waiting control plane converge..."
  sleep 30
done

# 3) refresh node AMI / kubelet per nodegroup name from IaC outputs
NODEGROUP=$(aws eks list-nodegroups --cluster-name "$CLUSTER" --region "$REGION" \
  --query 'nodegroups[0]' --output text)
echo "planned rolling update targeting $NODEGROUP"

# 4) reconcile addons after nodes healthy — ensure compatibility matrix consulted
aws eks list-addons --cluster-name "$CLUSTER" --region "$REGION"

# Document manual verification gates (Ingress smoke, STS IRSA workloads) before declaring complete.

AWS Quotas & Limits To Track

ENI quotas per instance type interplay with Pods when prefix delegation disabled.
ELB quotas per region—large ingress churn during testing exhausts quotas quickly.
EC2 Auto Scaling API rate limits amplified by aggressive Cluster Autoscaler loops.
Security group rule counts including cross-referenced LB + node SG combos.
IAM roles per account when each micro-service demands unique IRSA role.
Route53 ChangeResourceRecordSets throttling mirrored by ExternalDNS logs.
CloudWatch Logs ingestion spikes when apiserver audit verbose.
EKS addon API throttling surfaced as Terraform apply retries needing backoff.
STS regional endpoint throughput during massive rollout events.
EBS BurstBalance alarms when log-heavy nodes share gp2 pools.
Target group Attachment limits per LB complicate multi-namespace ingress designs.
API Discovery publish QPS spikes around CRD churn during Helm upgrades.
WAF ACL association limits pairing with controllers toggling shields.
Cross-AZ NAT Gateway bandwidth costs mistaken as application latency regressions.
KMS requests per second when many pods concurrently decrypt envelopes.
Service Quotas uplift tickets should reference FinOps stakeholder approval paths.

Surface limits early in sizing reviews alongside Terraform-driven IaC manifests so limits become code-reviewed constants.

Gotchas

!VPC mismatch: wrong subnets → nodes never join or LBs provision in the wrong SG.
!IRSA annotation typo: subtle namespace/SA mismatch → SDK falls back to node role (least surprise permissions).
!Security group sprawl: default cluster SG edits can break apiserver/worker signaling — track changes carefully.
!Add-on duplication: two CoreDNS controllers or vpc-cni versions cause hard-to-debug iptables/IPAM errors.
!CA vs PDB: aggressive PodDisruptionBudgets can block scale-down for long periods.
!Ingress LB pending: usually IAM/IRSA/subnet tags on the LB controller pod — correlate with Events.

Architecture & Trust Boundaries

Creating & Accessing Clusters

Compute: Node Groups & Alternatives

IRSA — Pod IAM Without Static Keys

EKS Add-ons & Versioning

AWS Load Balancer Controller

Cluster Autoscaler Basics

Helm Shape: AWS Load Balancer Controller

VPC CNI & Prefix Delegation

Fargate & Windows Footnotes

Operational Checklists

VPC, Subnets & Routing

APIServer Authorization & Access Entries

Kubernetes & Platform Upgrades

NLB Shape For Service Kind LoadBalancer

Troubleshooting Matrix

Rolling Control Plane Upgrades (Shape)

AWS Quotas & Limits To Track

Gotchas

Related Pages