Infrastructure as Code with Terraform on AWS Best Practices for Production

Introduction

Infrastructure as Code (IaC) has transformed how organizations provision and manage cloud resources. Terraform by HashiCorp stands out for its multi-cloud support, declarative syntax, and powerful state management. However, moving from proof-of-concept Terraform scripts to production-grade infrastructure requires adopting security, scalability, and team collaboration patterns.

In this blog, we will cover battle-tested best practices for using Terraform with AWS in production environments, including state management, modular design, security, CI/CD automation, and drift detection.

Pioneers in Cloud Consulting & Migration Services

Reduced infrastructural costs
Accelerated application deployment

Get Started

Prerequisites

An active AWS account with AWS IAM permissions for resource creation
Terraform CLI v1.5+ installed locally
AWS CLI configured with credentials
Basic understanding of HCL (HashiCorp Configuration Language)
Git repository for version control
An Amazon S3 bucket and Amazon DynamoDB table for remote state (we will create these)

Step-by-Step Guide

Step 1: Organizing Your Project Structure

A well-structured Terraform project separates environments, uses reusable modules, and isolates state files to minimize blast radius:

infrastructure/
├── environments/
│   ├── dev/
│   │   ├── main.tf
│   │   ├── variables.tf
│   │   ├── terraform.tfvars
│   │   └── backend.tf
│   ├── staging/
│   └── production/
├── modules/
│   ├── networking/
│   ├── compute/
│   ├── database/
│   └── monitoring/
└── global/
    ├── iam/
    └── dns/

infrastructure/

├── environments/

│ ├── dev/

│ │ ├── main.tf

│ │ ├── variables.tf

│ │ ├── terraform.tfvars

│ │ └── backend.tf

│ ├── staging/

│ └── production/

├── modules/

│ ├── networking/

│ ├── compute/

│ ├── database/

│ └── monitoring/

└── global/

├── iam/

└── dns/

Each environment maintains its own state file, so a development change can never accidentally destroy production resources. Global resources, such as AWS IAM roles and DNS zones, are managed separately.

Step 2: Configuring Remote State with Locking

Never store Terraform state locally in production. Use Amazon S3 with Amazon DynamoDB locking to enable team collaboration and prevent concurrent modification corruption:

# backend.tf
terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-state"
    key            = "production/networking/terraform.tfstate"
    region         = "ap-south-1"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
    kms_key_id     = "arn:aws:kms:ap-south-1:123456789:key/mrk-abc123"
  }
}

# backend.tf

terraform {

backend "s3" {

bucket = "mycompany-terraform-state"

key = "production/networking/terraform.tfstate"

region = "ap-south-1"

encrypt = true

dynamodb_table = "terraform-state-lock"

kms_key_id = "arn:aws:kms:ap-south-1:123456789:key/mrk-abc123"

}

Enable versioning on the state bucket to recover from state corruption. The Amazon DynamoDB table only needs a single LockID string attribute with pay-per-request billing.

Step 3: Building Reusable Modules

Modules encapsulate related resources into testable, reusable units. A good module hides implementation complexity and exposes configuration through variables:

# modules/networking/main.tf
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true

  tags = merge(var.common_tags, {
    Name = "${var.project}-${var.environment}-vpc"
  })
}

# environments/production/main.tf
module "networking" {
  source = "../../modules/networking"

  vpc_cidr    = "10.0.0.0/16"
  project     = "orderplatform"
  environment = "production"
  common_tags = local.common_tags
}

# modules/networking/main.tf

resource "aws_vpc" "main" {

cidr_block = var.vpc_cidr

enable_dns_hostnames = true

enable_dns_support = true

tags = merge(var.common_tags, {

Name = "${var.project}-${var.environment}-vpc"

})

}

# environments/production/main.tf

module "networking" {

source = "../../modules/networking"

vpc_cidr = "10.0.0.0/16"

project = "orderplatform"

environment = "production"

common_tags = local.common_tags

}

Pin module versions using Git tags (e.g., ?ref=v1.2.0) to prevent upstream changes from breaking deployments unexpectedly.

Step 4: Implementing a Tagging Strategy

Consistent tags enable cost allocation, automation, and compliance auditing. Use the AWS provider’s default_tags to apply tags automatically to all resources:

provider "aws" {
  region = "ap-south-1"
  default_tags {
    tags = {
      Project     = "OrderPlatform"
      Environment = var.environment
      ManagedBy   = "terraform"
      Owner       = "platform-team"
      CostCenter  = "engineering"
      Repository  = "github.com/myorg/infra"
    }
  }
}

provider "aws" {

region = "ap-south-1"

default_tags {

tags = {

Project = "OrderPlatform"

Environment = var.environment

ManagedBy = "terraform"

Owner = "platform-team"

CostCenter = "engineering"

Repository = "github.com/myorg/infra"

}

This eliminates untagged resources from appearing in cost reports and simplifies resource identification during incident response.

Step 5: Managing Secrets Securely

Terraform state files store values in plaintext. Protect sensitive data with these practices:

Encrypt state at rest using AWS KMS customer-managed keys
Reference secrets from AWS Secrets Manager instead of variables:

data "aws_secretsmanager_secret_version" "db_password" {
  secret_id = "production/database/master-password"
}

resource "aws_db_instance" "main" {
  engine         = "postgres"
  instance_class = "db.r6g.large"
  username       = "admin"
  password       = data.aws_secretsmanager_secret_version.db_password.secret_string

  lifecycle {
    ignore_changes = [password]
  }
}

data "aws_secretsmanager_secret_version" "db_password" {

secret_id = "production/database/master-password"

}

resource "aws_db_instance" "main" {

engine = "postgres"

instance_class = "db.r6g.large"

username = "admin"

password = data.aws_secretsmanager_secret_version.db_password.secret_string

lifecycle {

ignore_changes = [password]

}

Never commit .tfvars files containing sensitive values to Git
Use AWS IAM roles (not access keys) for Terraform execution in CI/CD

Step 6: Automating with CI/CD Pipelines

Production Terraform should never be executed from developer laptops. Implement a pipeline that enforces review and security scanning:

# .github/workflows/terraform.yml
name: Terraform CI/CD
on:
  pull_request:
    paths: ['infrastructure/**']
  push:
    branches: [main]

jobs:
  plan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform validate
      - run: terraform plan -out=tfplan
      - run: checkov -d . --framework terraform

  apply:
    needs: plan
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - run: terraform apply -auto-approve tfplan

# .github/workflows/terraform.yml

name: Terraform CI/CD

on:

pull_request:

paths: ['infrastructure/**']

push:

branches: [main]

jobs:

plan:

runs-on: ubuntu-latest

steps:

- uses: actions/checkout@v4

- uses: hashicorp/setup-terraform@v3

- run: terraform init

- run: terraform validate

- run: terraform plan -out=tfplan

- run: checkov -d . --framework terraform

apply:

needs: plan

if: github.ref == 'refs/heads/main'

environment: production

steps:

- run: terraform apply -auto-approve tfplan

This ensures every infrastructure change is reviewed via pull request, scanned for security issues, and has a full audit trail.

Step 7: Detecting and Managing Drift

Manual console changes create drift that undermines IaC governance. Schedule regular drift detection:

# Run as a cron job or scheduled pipeline
terraform plan -detailed-exitcode

# Exit code 0 = no changes (infrastructure matches config)
# Exit code 2 = drift detected (alert the team)

# Run as a cron job or scheduled pipeline

terraform plan -detailed-exitcode

# Exit code 0 = no changes (infrastructure matches config)

# Exit code 2 = drift detected (alert the team)

When drift is detected, decide whether to import the manual change into Terraform or revert infrastructure to match the declared state.

Common Mistakes to Avoid

Monolithic state files – Split state by domain (networking, compute, database) to reduce blast radius
Missing state locking – Always configure DynamoDB locking to prevent corruption
Hardcoded values – Use variables and data sources for account IDs, AMIs, and region names
Skipping plan review – Always review terraform plan output before applying changes
Unpinned module versions – Pin to specific Git tags to prevent breaking changes
No prevent_destroy on critical resources – Protect databases, Amazon S3 buckets, and AWS KMS keys from accidental deletion

Conclusion

Terraform on AWS delivers tremendous value when operated with production discipline. Remote state with locking, modular architecture, automated CI/CD pipelines, security scanning, and drift detection form the essential foundation.

Organizations adopting these practices gain repeatable deployments, full audit history, team collaboration, and the confidence to evolve infrastructure at the speed their business demands.

Drop a query if you have any questions regarding Terraform, and we will get back to you quickly.

Making IT Networks Enterprise-ready – Cloud Management Services

Accelerated cloud migration
End-to-end view of the cloud environment

Get Started

About CloudThat

CloudThat is an award-winning company and the first in India to offer cloud training and consulting services worldwide. As an AWS Premier Tier Services Partner, AWS Advanced Training Partner, Microsoft Solutions Partner, and Google Cloud Platform Partner, CloudThat has empowered over 1.1 million professionals through 1000+ cloud certifications, winning global recognition for its training excellence, including 20 MCT Trainers in Microsoft’s Global Top 100 and an impressive 14 awards in the last 9 years. CloudThat specializes in Cloud Migration, Data Platforms, DevOps, Security, IoT, and advanced technologies like Gen AI & AI/ML. It has delivered over 750 consulting projects for 850+ organizations in 30+ countries as it continues to empower professionals and enterprises to thrive in the digital-first world.

FAQs

1. Why is remote state management important when using Terraform in production?

ANS: – Remote state management enables multiple team members and automation pipelines to work with the same infrastructure safely. Storing state in a centralized location such as Amazon S3, combined with state locking, prevents conflicting updates and reduces the risk of infrastructure inconsistencies. It also provides backup and recovery capabilities through versioned state files.

2. How do Terraform modules improve infrastructure management?

ANS: – Terraform modules promote reusability, standardization, and maintainability by grouping related resources into reusable building blocks. Teams can deploy consistent infrastructure across environments while reducing duplicate code. Modules also simplify updates because changes can be implemented in a single location and reused wherever the module is referenced.

3. What is infrastructure drift, and why should organizations monitor it?

ANS: – Infrastructure drift occurs when deployed cloud resources no longer match the configuration defined in Terraform code, often due to manual changes made through the AWS Management Console or other tools. Regular drift detection helps maintain governance, ensures infrastructure remains compliant with organizational standards, and prevents unexpected behavior during future deployments.

WRITTEN BY Samarth Kulkarni

Samarth is a Senior Research Associate and AWS-certified professional with hands-on expertise in over 25 successful cloud migration, infrastructure optimization, and automation projects. With a strong track record in architecting secure, scalable, and cost-efficient solutions, he has delivered complex engagements across AWS, Azure, and GCP for clients in diverse industries. Recognized multiple times by clients and peers for his exceptional commitment, technical expertise, and proactive problem-solving, Samarth leverages tools such as Terraform, Ansible, and Python automation to design and implement robust cloud architectures that align with both business and technical objectives.