|
Voiced by Amazon Polly |
Introduction
Infrastructure as Code (IaC) has transformed how organizations provision and manage cloud resources. Terraform by HashiCorp stands out for its multi-cloud support, declarative syntax, and powerful state management. However, moving from proof-of-concept Terraform scripts to production-grade infrastructure requires adopting security, scalability, and team collaboration patterns.
In this blog, we will cover battle-tested best practices for using Terraform with AWS in production environments, including state management, modular design, security, CI/CD automation, and drift detection.
Pioneers in Cloud Consulting & Migration Services
- Reduced infrastructural costs
- Accelerated application deployment
Prerequisites
- An active AWS account with AWS IAM permissions for resource creation
- Terraform CLI v1.5+ installed locally
- AWS CLI configured with credentials
- Basic understanding of HCL (HashiCorp Configuration Language)
- Git repository for version control
- An Amazon S3 bucket and Amazon DynamoDB table for remote state (we will create these)
Step-by-Step Guide
Step 1: Organizing Your Project Structure
A well-structured Terraform project separates environments, uses reusable modules, and isolates state files to minimize blast radius:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
infrastructure/ ├── environments/ │ ├── dev/ │ │ ├── main.tf │ │ ├── variables.tf │ │ ├── terraform.tfvars │ │ └── backend.tf │ ├── staging/ │ └── production/ ├── modules/ │ ├── networking/ │ ├── compute/ │ ├── database/ │ └── monitoring/ └── global/ ├── iam/ └── dns/ |
Each environment maintains its own state file, so a development change can never accidentally destroy production resources. Global resources, such as AWS IAM roles and DNS zones, are managed separately.
Step 2: Configuring Remote State with Locking
Never store Terraform state locally in production. Use Amazon S3 with Amazon DynamoDB locking to enable team collaboration and prevent concurrent modification corruption:
|
1 2 3 4 5 6 7 8 9 10 11 |
# backend.tf terraform { backend "s3" { bucket = "mycompany-terraform-state" key = "production/networking/terraform.tfstate" region = "ap-south-1" encrypt = true dynamodb_table = "terraform-state-lock" kms_key_id = "arn:aws:kms:ap-south-1:123456789:key/mrk-abc123" } } |
Enable versioning on the state bucket to recover from state corruption. The Amazon DynamoDB table only needs a single LockID string attribute with pay-per-request billing.
Step 3: Building Reusable Modules
Modules encapsulate related resources into testable, reusable units. A good module hides implementation complexity and exposes configuration through variables:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# modules/networking/main.tf resource "aws_vpc" "main" { cidr_block = var.vpc_cidr enable_dns_hostnames = true enable_dns_support = true tags = merge(var.common_tags, { Name = "${var.project}-${var.environment}-vpc" }) } # environments/production/main.tf module "networking" { source = "../../modules/networking" vpc_cidr = "10.0.0.0/16" project = "orderplatform" environment = "production" common_tags = local.common_tags } |
Pin module versions using Git tags (e.g., ?ref=v1.2.0) to prevent upstream changes from breaking deployments unexpectedly.
Step 4: Implementing a Tagging Strategy
Consistent tags enable cost allocation, automation, and compliance auditing. Use the AWS provider’s default_tags to apply tags automatically to all resources:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
provider "aws" { region = "ap-south-1" default_tags { tags = { Project = "OrderPlatform" Environment = var.environment ManagedBy = "terraform" Owner = "platform-team" CostCenter = "engineering" Repository = "github.com/myorg/infra" } } } |
This eliminates untagged resources from appearing in cost reports and simplifies resource identification during incident response.
Step 5: Managing Secrets Securely
Terraform state files store values in plaintext. Protect sensitive data with these practices:
- Encrypt state at rest using AWS KMS customer-managed keys
- Reference secrets from AWS Secrets Manager instead of variables:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
data "aws_secretsmanager_secret_version" "db_password" { secret_id = "production/database/master-password" } resource "aws_db_instance" "main" { engine = "postgres" instance_class = "db.r6g.large" username = "admin" password = data.aws_secretsmanager_secret_version.db_password.secret_string lifecycle { ignore_changes = [password] } } |
- Never commit .tfvars files containing sensitive values to Git
- Use AWS IAM roles (not access keys) for Terraform execution in CI/CD
Step 6: Automating with CI/CD Pipelines
Production Terraform should never be executed from developer laptops. Implement a pipeline that enforces review and security scanning:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
# .github/workflows/terraform.yml name: Terraform CI/CD on: pull_request: paths: ['infrastructure/**'] push: branches: [main] jobs: plan: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: hashicorp/setup-terraform@v3 - run: terraform init - run: terraform validate - run: terraform plan -out=tfplan - run: checkov -d . --framework terraform apply: needs: plan if: github.ref == 'refs/heads/main' environment: production steps: - run: terraform apply -auto-approve tfplan |
This ensures every infrastructure change is reviewed via pull request, scanned for security issues, and has a full audit trail.
Step 7: Detecting and Managing Drift
Manual console changes create drift that undermines IaC governance. Schedule regular drift detection:
|
1 2 3 4 5 |
# Run as a cron job or scheduled pipeline terraform plan -detailed-exitcode # Exit code 0 = no changes (infrastructure matches config) # Exit code 2 = drift detected (alert the team) |
When drift is detected, decide whether to import the manual change into Terraform or revert infrastructure to match the declared state.
Common Mistakes to Avoid
- Monolithic state files – Split state by domain (networking, compute, database) to reduce blast radius
- Missing state locking – Always configure DynamoDB locking to prevent corruption
- Hardcoded values – Use variables and data sources for account IDs, AMIs, and region names
- Skipping plan review – Always review terraform plan output before applying changes
- Unpinned module versions – Pin to specific Git tags to prevent breaking changes
- No prevent_destroy on critical resources – Protect databases, Amazon S3 buckets, and AWS KMS keys from accidental deletion
Conclusion
Terraform on AWS delivers tremendous value when operated with production discipline. Remote state with locking, modular architecture, automated CI/CD pipelines, security scanning, and drift detection form the essential foundation.
Drop a query if you have any questions regarding Terraform, and we will get back to you quickly.
Making IT Networks Enterprise-ready – Cloud Management Services
- Accelerated cloud migration
- End-to-end view of the cloud environment
About CloudThat
FAQs
1. Why is remote state management important when using Terraform in production?
ANS: – Remote state management enables multiple team members and automation pipelines to work with the same infrastructure safely. Storing state in a centralized location such as Amazon S3, combined with state locking, prevents conflicting updates and reduces the risk of infrastructure inconsistencies. It also provides backup and recovery capabilities through versioned state files.
2. How do Terraform modules improve infrastructure management?
ANS: – Terraform modules promote reusability, standardization, and maintainability by grouping related resources into reusable building blocks. Teams can deploy consistent infrastructure across environments while reducing duplicate code. Modules also simplify updates because changes can be implemented in a single location and reused wherever the module is referenced.
3. What is infrastructure drift, and why should organizations monitor it?
ANS: – Infrastructure drift occurs when deployed cloud resources no longer match the configuration defined in Terraform code, often due to manual changes made through the AWS Management Console or other tools. Regular drift detection helps maintain governance, ensures infrastructure remains compliant with organizational standards, and prevents unexpected behavior during future deployments.
WRITTEN BY Samarth Kulkarni
Samarth is a Senior Research Associate and AWS-certified professional with hands-on expertise in over 25 successful cloud migration, infrastructure optimization, and automation projects. With a strong track record in architecting secure, scalable, and cost-efficient solutions, he has delivered complex engagements across AWS, Azure, and GCP for clients in diverse industries. Recognized multiple times by clients and peers for his exceptional commitment, technical expertise, and proactive problem-solving, Samarth leverages tools such as Terraform, Ansible, and Python automation to design and implement robust cloud architectures that align with both business and technical objectives.
Login

June 22, 2026
PREV
Comments