Terraform Dev/Prod Cluster Scaffold

TL;DR

Keep Dev and Prod as separate Terraform roots with separate backend state. Put reusable platform logic in modules/cluster, and let each environment pass only its own account, region, naming, sizing, and feature flags into that module.

Directory Layout

Use this scaffold when the cluster is the main platform unit and the same blueprint should create both Dev and Prod without sharing state.

textterraform-tree.txt

terraform/
  modules/
    cluster/
      main.tf          # VPC, cluster, node pools, IAM, add-on dependencies.
      variables.tf     # Stable inputs environments are allowed to change.
      outputs.tf       # Cluster name, endpoint, OIDC issuer, subnet IDs.
      templates/
        control-plane-user-data.sh.tftpl
        worker-user-data.sh.tftpl
        eks-node-user-data.sh.tftpl
  envs/
    dev/
      versions.tf
      providers.tf
      backend.tf       # State key such as clusters/dev/terraform.tfstate.
      main.tf          # Calls ../../modules/cluster.
      terraform.tfvars # Small node counts, cheaper instance classes.
    prod/
      versions.tf
      providers.tf
      backend.tf       # State key such as clusters/prod/terraform.tfstate.
      main.tf
      terraform.tfvars # HA sizing, tighter access, production tags.

Environment Roots

Each environment root is deployable by itself. Do not run Terraform from modules/cluster; run it from envs/dev or envs/prod.

hclenvs/dev/main.tf

module "cluster" {
  source = "../../modules/cluster"

  environment     = "dev"
  name            = "platform-dev"
  region          = "us-east-1"
  cluster_version = "1.30"

  node_groups = {
    general = {
      min_size       = 1
      max_size       = 3
      desired_size   = 1
      instance_types = ["t3.large"]
      capacity_type  = "SPOT"
    }
  }

  tags = {
    CostCenter = "platform-dev"
  }
}

hclenvs/prod/main.tf

module "cluster" {
  source = "../../modules/cluster"

  environment     = "prod"
  name            = "platform-prod"
  region          = "us-east-1"
  cluster_version = "1.30"

  node_groups = {
    general = {
      min_size       = 3
      max_size       = 8
      desired_size   = 3
      instance_types = ["m6i.large"]
      capacity_type  = "ON_DEMAND"
    }
  }

  tags = {
    CostCenter = "platform-prod"
    Criticality = "high"
  }
}

Separate State

The most important isolation boundary is state. Dev and Prod may share the same backend bucket, but they should use different keys, workspaces, credentials, or cloud accounts depending on the client's risk model.

hclenvs/prod/backend.tf

terraform {
  backend "s3" {
    bucket         = "client-terraform-state"
    key            = "clusters/prod/terraform.tfstate"
    region         = "us-east-1"
    dynamodb_table = "terraform-locks"
    encrypt        = true
  }
}

Cluster Module Contract

The module contract should expose environment differences without leaking implementation details. Keep the inputs boring, typed, and reviewable.

hclmodules/cluster/variables.tf

variable "environment" {
  type        = string
  description = "Environment name, such as dev or prod."
}

variable "name" {
  type        = string
  description = "Cluster and platform resource name prefix."
}

variable "region" {
  type        = string
  description = "Cloud region for this cluster."
}

variable "cluster_version" {
  type        = string
  description = "Kubernetes control plane version."
}

variable "node_groups" {
  description = "Managed node group definitions keyed by workload class."
  type = map(object({
    min_size       = number
    max_size       = number
    desired_size   = number
    instance_types = list(string)
    capacity_type  = string
  }))
}

variable "tags" {
  type        = map(string)
  description = "Additional tags applied to cluster resources."
  default     = {}
}

Node Bootstrap Templates

Use *.tftpl files when Terraform creates compute that must bootstrap itself at first boot. For kubeadm-style clusters, control-plane and worker nodes need different user data. For managed EKS node groups, user data is usually lighter because EKS owns the control plane and the EKS-optimized AMI already knows the bootstrap flow.

hclmodules/cluster/main.tf

locals {
  control_plane_user_data = templatefile("${path.module}/templates/control-plane-user-data.sh.tftpl", {
    project_name = var.name
    node_index   = 0
  })

  worker_user_data = templatefile("${path.module}/templates/worker-user-data.sh.tftpl", {
    project_name = var.name
    node_index   = 0
  })
}

kubeadm Control Plane User Data

This template prepares a self-managed control-plane host: disables swap, loads Kubernetes networking modules, installs containerd and kubeadm tooling, and writes a next-steps helper for kubeadm init. The kube-vip manifest supports an HA control-plane endpoint when multiple control-plane nodes are used.

bashtemplates/control-plane-user-data.sh.tftpl

#!/bin/bash
set -euxo pipefail

exec > >(tee /var/log/${project_name}-control-plane-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1

export DEBIAN_FRONTEND=noninteractive
KUBERNETES_VERSION="v1.30"
NODE_ROLE="control-plane"
NODE_INDEX="${node_index}"

echo "Starting bootstrap for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"

swapoff -a || true
sed -ri 's/^\s*([^#].*\sswap\s+sw\s+.*)$/# \1/' /etc/fstab || true

cat >/etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

cat >/etc/sysctl.d/99-kubernetes-cri.conf <<'EOF'
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

apt-get update
apt-get install -y apt-transport-https ca-certificates curl gpg containerd

mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
systemctl enable --now containerd

mkdir -p /etc/apt/keyrings
curl -fsSL "https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/Release.key" \
  | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/ /" \
  >/etc/apt/sources.list.d/kubernetes.list

apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl
systemctl enable kubelet

export VIP=172.31.1.100
export INTERFACE=eth0
export KVVERSION="v1.2.0-rc.0"
ctr image pull "ghcr.io/kube-vip/kube-vip:$${KVVERSION}"
mkdir -p /etc/kubernetes/manifests
ctr run --rm --net-host "ghcr.io/kube-vip/kube-vip:$${KVVERSION}" vip /kube-vip manifest pod --interface "$${INTERFACE}" --address "$${VIP}" --arp --controlplane --services --leaderElection | tee /etc/kubernetes/manifests/kube-vip.yaml

cat >/usr/local/bin/bootstrap-control-plane-next-steps.sh <<'EOF'
#!/bin/bash
set -euo pipefail

cat <<'MSG'
Control plane prerequisites are installed.

Suggested next steps on the first control-plane node:
1. Run kubeadm init with --control-plane-endpoint using the VIP address.
2. Configure kubectl from /etc/kubernetes/admin.conf.
3. Install Cilium or another CNI.
4. Validate kube-system Pods and node readiness.
5. Generate worker and additional control-plane join commands.
MSG
EOF
chmod +x /usr/local/bin/bootstrap-control-plane-next-steps.sh

echo "Bootstrap complete for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"

kubeadm Worker User Data

The worker template installs the same Linux, container runtime, and kubelet prerequisites, but leaves joining to the token generated by the initialized control-plane node.

bashtemplates/worker-user-data.sh.tftpl

#!/bin/bash
set -euxo pipefail

exec > >(tee /var/log/${project_name}-worker-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1

export DEBIAN_FRONTEND=noninteractive
KUBERNETES_VERSION="v1.30"
NODE_ROLE="worker"
NODE_INDEX="${node_index}"

echo "Starting bootstrap for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"

swapoff -a || true
sed -ri 's/^\s*([^#].*\sswap\s+sw\s+.*)$/# \1/' /etc/fstab || true

cat >/etc/modules-load.d/k8s.conf <<'EOF'
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter

cat >/etc/sysctl.d/99-kubernetes-cri.conf <<'EOF'
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF
sysctl --system

apt-get update
apt-get install -y apt-transport-https ca-certificates curl gpg containerd

mkdir -p /etc/containerd
containerd config default >/etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
systemctl enable --now containerd

mkdir -p /etc/apt/keyrings
curl -fsSL "https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/Release.key" \
  | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/$${KUBERNETES_VERSION}/deb/ /" \
  >/etc/apt/sources.list.d/kubernetes.list

apt-get update
apt-get install -y kubelet kubeadm
apt-mark hold kubelet kubeadm
systemctl enable kubelet

cat >/usr/local/bin/bootstrap-worker-next-steps.sh <<'EOF'
#!/bin/bash
set -euo pipefail

cat <<'MSG'
Worker prerequisites are installed.

Next step:
Run the join command generated from the control-plane node:
  sudo kubeadm join <api-server-endpoint>:6443 --token <token> \
    --discovery-token-ca-cert-hash sha256:<hash>
MSG
EOF
chmod +x /usr/local/bin/bootstrap-worker-next-steps.sh

echo "Bootstrap complete for ${project_name} $${NODE_ROLE} node $${NODE_INDEX}"

Managed EKS Node Group User Data

For standard managed EKS node groups, prefer the EKS-optimized AMI and managed node group defaults. Add custom user data only for small host-level setup such as package mirrors, labels, taints, kubelet flags, or log agent prerequisites.

bashtemplates/eks-node-user-data.sh.tftpl

#!/bin/bash
set -euxo pipefail

exec > >(tee /var/log/${cluster_name}-eks-node-bootstrap.log | logger -t user-data -s 2>/dev/console) 2>&1

echo "Starting EKS managed node bootstrap for ${cluster_name}"

# Optional host-level preparation. Keep this small so EKS AMI bootstrap remains the owner.
yum install -y amazon-cloudwatch-agent || true

/etc/eks/bootstrap.sh "${cluster_name}" \
  --kubelet-extra-args "--node-labels=environment=${environment},nodegroup=${node_group_name}"

echo "EKS managed node bootstrap complete for ${cluster_name}"

hclmodules/cluster/eks-managed-node-group.tf

resource "aws_launch_template" "managed_node_group" {
  name_prefix = "${var.name}-${var.environment}-"

  user_data = base64encode(templatefile("${path.module}/templates/eks-node-user-data.sh.tftpl", {
    cluster_name    = module.eks.cluster_name
    environment     = var.environment
    node_group_name = "general"
  }))
}

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "${var.name}-${var.environment}"
  cluster_version = var.cluster_version

  eks_managed_node_groups = {
    general = {
      min_size       = 2
      max_size       = 6
      desired_size   = 3
      instance_types = ["m6i.large"]

      launch_template_id      = aws_launch_template.managed_node_group.id
      launch_template_version = "$Latest"
    }
  }
}

Guardrails

Practice	Reason
Separate `envs/dev` and `envs/prod`	Plans, applies, and state are scoped to one environment.
Use separate backend keys	A Dev apply cannot mutate Prod state by accident.
Pass differences through variables	The shared module stays consistent while sizing and policy vary.
Tag every resource with environment	Cost, ownership, and incident review stay traceable.
Keep global resources in a separate root	DNS zones, registries, and shared IAM boundaries usually outlive clusters.

Apply Workflow

Run Terraform from the environment leaf. Production applies should normally happen through CI with protected approvals.

bashenv-workflow.sh

cd terraform/envs/dev
terraform init
terraform validate
terraform plan -out=tfplan
terraform apply tfplan

cd ../prod
terraform init
terraform plan -out=tfplan
# Apply prod only after review and approval.
terraform apply tfplan