TL;DR

Before fixing Terraform, capture evidence: exact command, workspace/root path, backend key, plan output, state address, cloud resource ID, and who last applied. Never force-unlock, state rm, import, replace, or destroy until you know whether another apply is running and what owns the resource.

First Five Minutes

Run these commands first when a Terraform issue is reported. They confirm the working directory, Terraform version, workspace, providers, validation status, and visible state before you touch locks or apply changes.

bashterraform-first-look.sh
pwd
terraform version
terraform workspace show
terraform providers
terraform init -reconfigure
terraform validate
terraform plan -no-color
terraform state list | head -50

Symptom To Cause

SymptomLikely causeFirst check
State lock errorAnother apply or crashed jobCI run history, lock table/blob metadata
Plan wants to destroy many resourcesWrong workspace/backend/vars/provider accountCurrent account, backend key, tfvars
Provider auth failureExpired token, wrong OIDC role, missing env varsCloud identity command and CI role
Resource already existsExisting client resource not importedWrite block then terraform import
Perpetual diffCloud default, provider bug, ignored field neededProvider docs and live resource values
Dependency cycleOver-coupled module referencesGraph dependencies and outputs
Plan recreates EKS/AKS/GKEImmutable field changedCheck exact attribute forcing replacement

State Lock

Use this when Terraform says it cannot acquire the state lock. Confirm no CI job or engineer is actively applying before using force-unlock, because unlocking a real apply can corrupt state.

bashlock-check.sh
# Do not force-unlock first. Confirm no pipeline or human apply is active.
terraform plan

# If the lock is stale and approved:
terraform force-unlock LOCK_ID

Wrong Account Or Backend

Use these checks when a plan looks wildly wrong, especially if it wants to destroy many resources. Most scary Terraform plans come from the wrong cloud account, project, subscription, workspace, backend key, or tfvars file.

bashidentity-check.sh
# AWS
aws sts get-caller-identity

# Azure
az account show

# GCP
gcloud auth list
gcloud config get-value project

# Terraform context
terraform workspace show
terraform state pull | head

Dangerous Commands

CommandRiskSafer first step
terraform destroyDeletes managed infrastructureUse plan review and approval
terraform state rmOrphans resource from TerraformUnderstand why state/code disagree
terraform importCan attach wrong real resource to codeVerify resource ID and address
terraform force-unlockCan corrupt concurrent applyCheck active CI/human runs
terraform apply -replaceRecreates resource, may cause outageConfirm replacement blast radius
terraform apply -targetCan skip dependenciesUse only for documented emergency scope

Drift Investigation

Use refresh-only planning to see how live cloud resources differ from state without applying a functional change. After that, decide whether to update Terraform code, revert the manual change, or formally accept the drift.

bashdrift.sh
terraform plan -refresh-only -out=refresh.tfplan
terraform show refresh.tfplan

# If drift is expected, update Terraform code.
# If drift is emergency manual work, decide whether to keep it or revert it.

Import Recovery

Use import recovery when a real resource exists but Terraform does not track it, or when state was lost for a known object. Verify the resource ID and Terraform address carefully before importing.

bashimport-recovery.sh
terraform state list | rg app_lb
terraform import 'module.network.aws_security_group.app_lb' sg-0123456789abcdef0
terraform plan

# Tune code until plan does not try to replace the imported object.

Provider Errors

ErrorCheck
AWS InvalidClientTokenIdExpired/wrong AWS credentials, STS denied, wrong profile
Azure authorization failedMissing role assignment, wrong subscription, stale login
GCP 403API disabled, service account missing IAM role, wrong project
Kubernetes provider connection refusedCluster not created yet, kubeconfig missing, private endpoint unreachable
Helm provider timeoutChart resources unhealthy; inspect Kubernetes events and Helm release

Before Applying In Production

  • 1Confirm the backend key, workspace, cloud account, region, and tfvars are correct.
  • 2Read the plan for destroy/replace lines, not just the summary count.
  • 3Check whether the resource is owned by Terraform, Helm, ArgoCD, a cloud console process, or another repo.
  • 4Take extra care with VPCs, subnets, IAM roles, DNS zones, state buckets, cluster resources, and databases.