AWS VPC CNI — K8s SRE Reference

TL;DR

The amazon-vpc-cni-k8s plugin assigns real VPC IPs to pods from worker ENIs. Dense clusters exhaust secondary IPs unless you enable prefix delegation, right-size subnets, or move to an overlay CNI. Debug with the aws-node DaemonSet logs and subnet IP utilization—not only Kubernetes Events.

How VPC CNI Differs From Overlay CNIs

Calico/Cilium overlay modes often need one routable IP per node; pods live on an internal pod CIDR. VPC CNI consumes subnet IPs per pod (host-routing). That simplifies security groups and VPC flow logs but makes subnet sizing the bottleneck.

Model	Pod IP source	When to prefer
VPC CNI (default EKS)	Worker subnet / ENI prefixes	Native SG-per-pod, L7/L4 visibility in VPC, minimal NAT hairpins.
Overlay (Cilium/Calico)	Internal pod network	Very dense nodes, multi-cloud portability, eBPF/kube-proxy replacement.

ENIs, Secondary IPs & Prefix Delegation

Each node attaches one or more Elastic Network Interfaces (ENIs). Without prefix delegation, the CNI allocates discrete secondary private IPs—capped by instance type ENI limits. With prefix delegation, the CNI assigns a /28 prefix block per ENI slot, dramatically increasing pod density on smaller instances.

Environment variable	Purpose
`ENABLE_PREFIX_DELEGATION=true`	Use /28 prefixes instead of single-IP allocations on ENIs.
`WARM_PREFIX_TARGET=1`	Pre-warm one extra prefix to reduce cold-start latency when bursts schedule many pods.
`WARM_IP_TARGET` / `MINIMUM_IP_TARGET`	Legacy warm-pool tuning when prefix delegation is off—balance waste vs scheduling speed.

yamlvpc-cni-addon-env.yaml

# EKS managed add-on configurationValues (shape varies by addon version).
env:
  - name: ENABLE_PREFIX_DELEGATION
    value: "true"
  - name: WARM_PREFIX_TARGET
    value: "1"

EKS summary and auth patterns: EKS Deep Dive — VPC CNI.

Security Groups for Pods

With security groups for pods, the CNI attaches pod-specific SGs to branch ENIs. Each rule multiplies with pod count—review total SG rule limits and inter-AZ traffic. Pair with AWS IAM & Security Groups before opening broad ingress.

yamlsgp-policy-sample.yaml

apiVersion: vpcresources.k8s.aws/v1beta1
kind: SecurityGroupPolicy
metadata:
  name: payments-api-sgp
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: api
  securityGroups:
    groupIds:
      - sg-0abc123def4567890

Custom Networking & ENIConfig

By default pods share the node primary subnet. Custom networking uses ENIConfig CRs to pin pods to alternate subnets (e.g. dedicated pod subnets per AZ). Requires AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG=true and per-AZ ENIConfig objects—plan routing and IPAM before enabling in production.

yamleniconfig-sample.yaml

apiVersion: crd.k8s.amazonaws.com/v1alpha1
kind: ENIConfig
metadata:
  name: us-east-1a-pod-net
spec:
  subnet: subnet-0podaaaa111111111
  securityGroups:
    - sg-0workersg111111111

Verify & Observe

bashvpc-cni-check.sh

# CNI daemon (name may be aws-node or vpc-cni depending on install path)
kubectl logs -n kube-system daemonset/aws-node --tail=100

# Addon version and configuration
aws eks describe-addon --cluster-name prod-platform --addon-name vpc-cni --region us-east-1

# Subnet IP pressure (replace subnet IDs from your node/pod subnets)
aws ec2 describe-subnets --subnet-ids subnet-aaa subnet-bbb \
  --query 'Subnets[*].{Id:SubnetId,Available:AvailableIpAddressCount,AZ:AvailabilityZone}'

Troubleshooting

Symptom	Likely cause	Fix vector
`FailedCreatePodSandBox`	Subnet or ENI IP exhaustion	Enable prefix delegation; expand pod subnets; reduce `WARM_*` waste.
Pods schedule only in one AZ	Subnet out of IPs in other AZs	Balance subnet sizing per AZ; check ENIConfig AZ labels.
Intermittent AWS API errors in CNI logs	Node IAM / IMDS hop limits	Confirm node role has `ec2:NetworkInterface` actions; IRSA hop settings.
SG policy not applied	Branch ENI feature disabled or wrong labels	Validate SecurityGroupPolicy CRD and controller version.

Cross-cloud CNI triage playbook: Cloud Infrastructure Failures — CNI.

When to Switch CNI Model

If subnet math never closes (thousands of pods per /24), evaluate Cilium or Calico overlay on EKS—accept different operational trade-offs (tunneling, policy model, kube-proxy replacement). Migration is a planned project, not a hotfix.