#
004 - Terraform Adoption
Last Modified: July 1st, 2025
Title: Adoption of Terraform for Provisioning AWS Backend Services
Status: [ Implemented
#
Context
Our current approach to provisioning and managing AWS backend services is fragmented and inconsistent. We utilize a mix of AWS SAM (Serverless Application Model) templates for serverless components and custom Bash scripts for other AWS resources. This hybrid method has led to several critical challenges:
- Inconsistency: Different teams and projects use varying patterns and versions for deploying similar resources, leading to configuration drift and operational inconsistencies across environments.
- Manual Errors: Reliance on imperative Bash scripts introduces a high potential for human error during deployments and updates, making operations brittle and unreliable.
- Lack of Standardization: There is no unified "Infrastructure as Code" (IaC) standard, hindering reusability and making it difficult to onboard new team members effectively.
- Poor Change Management: Tracking infrastructure changes is cumbersome, as modifications are scattered across multiple script files and SAM templates, lacking a centralized state.
- Slow Deployments and Rollbacks: Manual or script-driven deployments are often slow, and rolling back to a previous known good state is complex and risky.
- Limited Scope of SAM: SAM is excellent for serverless, but it doesn't adequately cover the broader spectrum of AWS backend services (e.g., EC2, RDS, VPC configurations) without significant workarounds or additional tooling.
This decision is needed to establish a robust, standardized, and reliable IaC practice that can scale with our growing infrastructure needs and improve overall operational efficiency.
#
Decision
We will adopt Terraform as the standard Infrastructure as Code (IaC) tool for provisioning, managing, and versioning all backend AWS services, phasing out the use of AWS SAM templates and custom Bash scripts for new infrastructure deployments.
#
Rationale
This decision is driven by the need for a comprehensive, declarative, and consistent IaC solution that can manage our diverse AWS backend infrastructure effectively.
Factors that influenced the decision:
- Standardization: Terraform offers a single, unified language and workflow for defining and deploying all AWS resources, promoting consistency across projects and teams.
- Declarative Configuration: Its declarative nature allows us to define the desired state of our infrastructure, and Terraform intelligently figures out how to achieve that state, minimizing manual intervention and errors.
- State Management: Terraform maintains a state file that maps real-world resources to our configuration, enabling precise tracking of infrastructure changes and preventing configuration drift.
- Modularity and Reusability: Terraform's module system allows us to create reusable, versioned infrastructure components, accelerating development and ensuring best practices are propagated.
- Plan/Apply Workflow: The
terraform plancommand provides a clear preview of changes before they are applied, significantly reducing the risk of unexpected modifications in production environments. - Extensive AWS Provider Support: Terraform's AWS provider is comprehensive and actively maintained, supporting a vast array of AWS services and features.
Evidence/Research: Terraform is an industry-leading IaC tool with widespread adoption by organizations of all sizes. Its robust state management, idempotent operations, and ability to manage complex resource dependencies are well-documented and proven in production environments. Compared to SAM, which is primarily focused on serverless applications and less flexible for broader infrastructure, and Bash scripts, which are imperative, error-prone, and lack state management, Terraform provides a superior, holistic solution for our IaC needs.
Strengths of the chosen solution:
- Comprehensive AWS Coverage: Manages virtually any AWS resource, providing a single source of truth for our entire backend infrastructure.
- Idempotency: Running the same Terraform configuration multiple times will result in the same infrastructure state, ensuring consistency.
- Version Control for Infrastructure: Terraform configurations can be stored in Git, allowing for full version history, code reviews, and rollbacks.
- Reduced Manual Errors: The declarative approach and
planstep drastically reduce the chance of human error during provisioning.
#
Implications
Adopting Terraform will have significant consequences across our development and operations workflows.
People/Training:
- All development and operations teams involved in provisioning AWS backend services will require comprehensive training on Terraform's HashiCorp Configuration Language (HCL) syntax, best practices, module development, and state management.
- Specifically, the Software Architecture team will conduct working sessions to bring the Infrastructure team up to speed on converting to Terraform, focusing on practical application and migration strategies.
- Emphasis will be placed on understanding the
plan/applyworkflow and remote state management. - A dedicated "Terraform champion" or small team may be needed initially to guide adoption and develop foundational modules.
Process Adjustments:
- CI/CD Pipelines: Existing CI/CD pipelines for deploying AWS backend services will need significant updates to incorporate Terraform
init,plan, andapplysteps. These pipelines will be implemented and run within Azure DevOps. Automated testing of Terraform configurations will also be integrated. - Code Review: New code review processes will be established specifically for Terraform configurations, focusing on security, cost optimization, and adherence to best practices.
- Deprecation: For new AWS backend services, the strong preference is to provision them directly via Terraform from the outset. However,
terraform importwill be utilized as the mechanism to bring any existing resources (including those recently created outside of IaC) under Terraform management as part of the phased migration. - State Management Strategy: A robust strategy for managing Terraform state files (e.g., using S3 for backend storage with native S3 state locking) must be implemented and enforced.
- CI/CD Pipelines: Existing CI/CD pipelines for deploying AWS backend services will need significant updates to incorporate Terraform
Tooling:
- Terraform CLI: This will become a mandatory tool for all infrastructure developers and CI/CD agents.
- Remote State Backend: Implementation of a remote state backend (e.g., AWS S3 with native S3 state locking) is crucial for collaborative development and preventing state corruption.
- Linting/Static Analysis: Integrate tools like
terraform validateandtflintinto the development workflow and CI/CD.
Risks:
- Initial Learning Curve: Teams unfamiliar with declarative IaC or HCL may experience a temporary decrease in velocity during the initial learning phase.
- State File Management: Improper management of Terraform state files can lead to infrastructure corruption or resource loss. Strict protocols and automation are required.
- Migration Complexity: Migrating existing, manually provisioned or script-managed resources to Terraform can be complex and requires careful planning to avoid downtime or unintended changes.
- Security: Ensuring that Terraform configurations follow security best practices and that state files are securely stored and accessed is paramount.
- "Terraform Drift": Manual changes to infrastructure outside of Terraform can lead to discrepancies between the actual state and the Terraform state, requiring careful reconciliation.
#
Trade-Offs
Benefits:
- Consistent Infrastructure: Ensures all AWS backend services are provisioned and managed uniformly.
- Faster and More Reliable Deployments: Automated, declarative deployments reduce errors and speed up delivery.
- Improved Change Tracking: All infrastructure changes are versioned in Git, providing a clear audit trail.
- Enhanced Reusability: Modular design promotes sharing of common infrastructure patterns.
- Reduced Manual Errors: Eliminates the need for manual configuration and scripting.
- Easier Disaster Recovery: Infrastructure can be quickly rebuilt from code.
- Better Collaboration: Centralized configurations and state facilitate team collaboration.
Drawbacks:
- Initial Investment: Time and resources required for training, tooling setup, and initial module development.
- Migration Effort: Existing infrastructure needs to be imported or recreated in Terraform, which can be a substantial undertaking.
- State Management Overhead: Requires careful management of Terraform state files to prevent issues.
- Increased Complexity for Simple Deployments: For very simple, one-off resources, using Terraform might feel like overkill compared to a quick SAM Template (though the long-term benefits outweigh this).
#
Key Evaluation Metrics
Define clear criteria to measure whether the decision solves the intended problems.
- Deployment Time Reduction: Measure the average time taken to provision a new, complex backend AWS service (e.g., a new VPC with EC2 instances and RDS database), aiming for a significant reduction compared to previous methods.
- Infrastructure Consistency: Track the number of configuration discrepancies or "drift" incidents between environments, aiming for near-zero.
- Reduction in Infrastructure-Related Incidents: Monitor the number of production incidents directly attributable to manual provisioning errors or inconsistent configurations, expecting a substantial decrease.
- Terraform Adoption Rate: Percentage of new AWS backend services provisioned entirely via Terraform within the first 6-12 months.
- Developer Satisfaction: Gather feedback from development teams regarding the ease, speed, and reliability of provisioning infrastructure using Terraform compared to previous methods.
- Module Reusability: Track the number of reusable Terraform modules created and adopted across different projects.
#
Conclusion
We strongly recommend adopting Terraform as our primary Infrastructure as Code solution for all AWS backend services. This move will standardize our infrastructure provisioning, significantly reduce manual errors, improve deployment velocity, and enhance the overall reliability and maintainability of our cloud environments. While there will be an initial investment in training and migration, the long-term benefits of a consistent, version-controlled, and auditable infrastructure far outweigh these challenges.
The next steps include:
- Establishing a pilot project to implement Terraform for a new backend service.
- Developing a comprehensive training plan for all affected teams.
- Defining a clear strategy for managing Terraform state and CI/CD integration.
- Planning the phased migration of existing infrastructure.
#
References
- Terraform Documentation: https://www.terraform.io/docs/
- Terraform in 100 Seconds
- Code Using Terraform