Terraform Dev/Prod Cluster Scaffold
Keep Dev and Prod as separate Terraform roots with separate backend state. Put reusable platform logic in modules/cluster, and let each environment pass only its own account, region, naming, sizing, and feature flags into that module.
Directory Layout
Use this scaffold when the cluster is the main platform unit and the same blueprint should create both Dev and Prod without sharing state.
terraform/
modules/
cluster/
main.tf # VPC, cluster, node pools, IAM, add-on dependencies.
variables.tf # Stable inputs environments are allowed to change.
outputs.tf # Cluster name, endpoint, OIDC issuer, subnet IDs.
templates/
control-plane-user-data.sh.tftpl
worker-user-data.sh.tftpl
eks-node-user-data.sh.tftpl
envs/
dev/
versions.tf
providers.tf
backend.tf # State key such as clusters/dev/terraform.tfstate.
main.tf # Calls ../../modules/cluster.
terraform.tfvars # Small node counts, cheaper instance classes.
prod/
versions.tf
providers.tf
backend.tf # State key such as clusters/prod/terraform.tfstate.
main.tf
terraform.tfvars # HA sizing, tighter access, production tags.Environment Roots
Each environment root is deployable by itself. Do not run Terraform from modules/cluster; run it from envs/dev or envs/prod.
module "cluster" {
source = "../../modules/cluster"
environment = "dev"
name = "platform-dev"
region = "us-east-1"
cluster_version = "1.30"
node_groups = {
general = {
min_size = 1
max_size = 3
desired_size = 1
instance_types = ["t3.large"]
capacity_type = "SPOT"
}
}
tags = {
CostCenter = "platform-dev"
}
}module "cluster" {
source = "../../modules/cluster"
environment = "prod"
name = "platform-prod"
region = "us-east-1"
cluster_version = "1.30"
node_groups = {
general = {
min_size = 3
max_size = 8
desired_size = 3
instance_types = ["m6i.large"]
capacity_type = "ON_DEMAND"
}
}
tags = {
CostCenter = "platform-prod"
Criticality = "high"
}
}Separate State
The most important isolation boundary is state. Dev and Prod may share the same backend bucket, but they should use different keys, workspaces, credentials, or cloud accounts depending on the client's risk model.
terraform {
backend "s3" {
bucket = "client-terraform-state"
key = "clusters/prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}Cluster Module Contract
The module contract should expose environment differences without leaking implementation details. Keep the inputs boring, typed, and reviewable.
variable "environment" {
type = string
description = "Environment name, such as dev or prod."
}
variable "name" {
type = string
description = "Cluster and platform resource name prefix."
}
variable "region" {
type = string
description = "Cloud region for this cluster."
}
variable "cluster_version" {
type = string
description = "Kubernetes control plane version."
}
variable "node_groups" {
description = "Managed node group definitions keyed by workload class."
type = map(object({
min_size = number
max_size = number
desired_size = number
instance_types = list(string)
capacity_type = string
}))
}
variable "tags" {
type = map(string)
description = "Additional tags applied to cluster resources."
default = {}
}Node Bootstrap Templates
Use *.tftpl files when Terraform creates compute that must bootstrap itself at first boot. For kubeadm-style clusters, control-plane and worker nodes need different user data. For managed EKS node groups, user data is usually lighter because EKS owns the control plane and the EKS-optimized AMI already knows the bootstrap flow.
locals {
control_plane_user_data = templatefile("${path.module}/templates/control-plane-user-data.sh.tftpl", {
project_name = var.name
node_index = 0
})
worker_user_data = templatefile("${path.module}/templates/worker-user-data.sh.tftpl", {
project_name = var.name
node_index = 0
})
}kubeadm Control Plane User Data
This template prepares a self-managed control-plane host: disables swap, loads Kubernetes networking modules, installs containerd and kubeadm tooling, and writes a next-steps helper for kubeadm init. The kube-vip manifest supports an HA control-plane endpoint when multiple control-plane nodes are used.
#!/bin/bash
set -euxo pipefail
exec > >(tee /var/log/${project_name}-control-plane-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1
export DEBIAN_FRONTEND=noninteractive
KUBERNETES_VERSION="v1.30"
NODE_ROLE="control-plane"
NODE_INDEX="${node_index}"
echo "Starting bootstrap for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"
swapoff -a || true
sed -ri 's/^\s*([^#].*\sswap\s+sw\s+.*)$/# \1/' /etc/fstab || true
cat >/etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat >/etc/sysctl.d/99-kubernetes-cri.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
apt-get update
apt-get install -y apt-transport-https ca-certificates curl gpg containerd
mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
systemctl enable --now containerd
mkdir -p /etc/apt/keyrings
curl -fsSL "https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/Release.key" \
| gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/ /" \
>/etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
systemctl enable kubelet
export VIP=172.31.1.100
export INTERFACE=eth0
export KVVERSION="v1.2.0-rc.0"
ctr image pull "ghcr.io/kube-vip/kube-vip:$${KVVERSION}"
mkdir -p /etc/kubernetes/manifests
ctr run --rm --net-host "ghcr.io/kube-vip/kube-vip:$${KVVERSION}" vip /kube-vip manifest pod --interface "$${INTERFACE}" --address "$${VIP}" --arp --controlplane --services --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml
cat >/usr/local/bin/bootstrap-control-plane-next-steps.sh <<'EOF'
#!/bin/bash
set -euo pipefail
cat <<'MSG'
Control plane prerequisites are installed.
Suggested next steps on the first control-plane node:
1. Run kubeadm init with --control-plane-endpoint using the VIP address.
2. Configure kubectl from /etc/kubernetes/admin.conf.
3. Install Cilium or another CNI.
4. Validate kube-system Pods and node readiness.
5. Generate worker and additional control-plane join commands.
MSG
EOF
chmod +x /usr/local/bin/bootstrap-control-plane-next-steps.sh
echo "Bootstrap complete for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"kubeadm Worker User Data
The worker template installs the same Linux, container runtime, and kubelet prerequisites, but leaves joining to the token generated by the initialized control-plane node.
#!/bin/bash
set -euxo pipefail
exec > >(tee /var/log/${project_name}-worker-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1
export DEBIAN_FRONTEND=noninteractive
KUBERNETES_VERSION="v1.30"
NODE_ROLE="worker"
NODE_INDEX="${node_index}"
echo "Starting bootstrap for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"
swapoff -a || true
sed -ri 's/^\s*([^#].*\sswap\s+sw\s+.*)$/# \1/' /etc/fstab || true
cat >/etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat >/etc/sysctl.d/99-kubernetes-cri.conf <<'EOF'
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system
apt-get update
apt-get install -y apt-transport-https ca-certificates curl gpg containerd
mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
systemctl enable --now containerd
mkdir -p /etc/apt/keyrings
curl -fsSL "https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/Release.key" \
| gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/ /" \
>/etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm
apt-mark hold kubelet kubeadm
systemctl enable kubelet
cat >/usr/local/bin/bootstrap-worker-next-steps.sh <<'EOF'
#!/bin/bash
set -euo pipefail
cat <<'MSG'
Worker prerequisites are installed.
Next step:
Run the join command generated from the control-plane node:
sudo kubeadm join <api-server-endpoint>:6443 --token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
MSG
EOF
chmod +x /usr/local/bin/bootstrap-worker-next-steps.sh
echo "Bootstrap complete for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"Managed EKS Node Group User Data
For standard managed EKS node groups, prefer the EKS-optimized AMI and managed node group defaults. Add custom user data only for small host-level setup such as package mirrors, labels, taints, kubelet flags, or log agent prerequisites.
#!/bin/bash
set -euxo pipefail
exec > >(tee /var/log/${cluster_name}-eks-node-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1
echo "Starting EKS managed node bootstrap for ${cluster_name}"
# Optional host-level preparation. Keep this small so EKS AMI bootstrap remains the owner.
yum install -y amazon-cloudwatch-agent || true
/etc/eks/bootstrap.sh "${cluster_name}" \
--kubelet-extra-args "--node-labels=environment=${environment},nodegroup=${node_group_name}"
echo "EKS managed node bootstrap complete for ${cluster_name}"resource "aws_launch_template" "managed_node_group" {
name_prefix = "${var.name}-${var.environment}-"
user_data = base64encode(templatefile("${path.module}/templates/eks-node-user-data.sh.tftpl", {
cluster_name = module.eks.cluster_name
environment = var.environment
node_group_name = "general"
}))
}
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "${var.name}-${var.environment}"
cluster_version = var.cluster_version
eks_managed_node_groups = {
general = {
min_size = 2
max_size = 6
desired_size = 3
instance_types = ["m6i.large"]
launch_template_id = aws_launch_template.managed_node_group.id
launch_template_version = "$Latest"
}
}
}Guardrails
| Practice | Reason |
|---|---|
Separate envs/dev and envs/prod | Plans, applies, and state are scoped to one environment. |
| Use separate backend keys | A Dev apply cannot mutate Prod state by accident. |
| Pass differences through variables | The shared module stays consistent while sizing and policy vary. |
| Tag every resource with environment | Cost, ownership, and incident review stay traceable. |
| Keep global resources in a separate root | DNS zones, registries, and shared IAM boundaries usually outlive clusters. |
Apply Workflow
Run Terraform from the environment leaf. Production applies should normally happen through CI with protected approvals.
cd terraform/envs/dev
terraform init
terraform validate
terraform plan -out=tfplan
terraform apply tfplan
cd ../prod
terraform init
terraform plan -out=tfplan
# Apply prod only after review and approval.
terraform apply tfplan