Terraform makes it easy to manage and update your infrastructure by keeping track of all your resources within a file called the "state file." This file tells Terraform whatâs already in place, so any changes can be made accurately. For example, if youâre setting up an instance, Terraform records its details in the state file. Then, when you add more servers or update settings, Terraform uses this record to apply only the necessary changes.
However, in a team setting, this file can quickly create a challenge.Â
If two team members try to make changes at the same time, they might accidentally overwrite each otherâs work. Additionally, without a history of previous states, it's hard to undo changes if something goes wrong. This is where features like state locking and versioning become importantâthey help ensure that only one person makes updates at a time and keep a record of past states, so teams can safely work together and recover from mistakes.
This blog covers the challenges of managing Terraform state files in shared environments and solutions to keep things smooth and reliable. We'll explain why locking the state file is crucial to avoid conflicts, how it works with AWS backends like S3 and DynamoDB, and how Terraform Cloud provides a simpler alternative with built-in locking. We'll also look at the benefits of versioningâsuch as easy rollbacks and tracking changesâand show how to set this up in S3. Finally, weâll introduce Firefly, a tool for viewing deleted resources and past versions, adding visibility and control over changes.
Local Vs. Remote State in Terraform
In Terraform, a local state file is stored on your machine, tracking information about your infrastructure. This is straightforward and works well for small, solo projects, like setting up a personal website. But as your team grows, managing the state locally can become challenging. Everyone needs access to the latest state, and without that, conflicts and inconsistencies such as unintentionally overwrites causing misconfigurations or even infrastructure downtime may arise.
A remote backend, such as Azure Blob Storage, AWS S3 or Google Cloud Storage, centralizes the state file. It ensures that everyone on the team works with the same up-to-date state. It also provides the added benefit of state locking (e.g., with DynamoDB on AWS) to prevent multiple people from making conflicting changes at the same time.
Switching to a remote backend with state locking is straightforward, as shown below:
This configuration helps prevent conflicts and keeps a reliable record of changes, making collaboration smoother and safer.
Here's a quick comparison of remote and local state in Terraform:
Now that we have discussed the shift from local to remote state management, let's look at state locking and how to implement it in our state file in AWS S3.
What is the Need for State Locking in Terraform?
Letâs consider a scenario where two engineers, Engineer A and Engineer B, are working on a cloud-based application, for example, an EC2 instance hosted on AWS. Engineer A is tasked with adding tags to the instance, while Engineer B is trying to modify its instance size. Both engineers are working on the same Terraform state file stored remotely, say in an AWS S3 bucket.
If both engineers run terraform apply
at the same time, Terraform doesnât know that both are modifying the same resource. Engineer Aâs changes could be overwritten by Engineer Bâs, or worse, some of the modifications might be partially applied, resulting in an inconsistent state. This can lead to issues like missing tags or incorrect instance sizes, causing confusion or downtime when the infrastructure doesnât match the intended configuration.
State locking in Terraform helps to prevent this inconsistency. When state locking is enabled (via S3 and DynamoDB or Terraform Cloud), Terraform ensures that only one user can make changes to the state file at a time. This prevents the scenario where multiple engineers are modifying the same infrastructure simultaneously, thus avoiding race conditions and conflicting updates.
Here's a quick comparison of how things differ with and without state locking:
Implementing State Locking with Remote Backends
In Terraform, state locking ensures that only one process can modify the state at a time. When you store your state remotely (e.g., in AWS S3), you need a way to prevent multiple Terraform processes from simultaneously updating the same state file. This is where DynamoDB comes into play. It acts as a lock manager, ensuring that only one operation can modify the state at any given time.
Hereâs a breakdown of the above diagram:
- DevOps/Infra Engineer writes Terraform manifest files (.tf) for infrastructure provisioning.
- These manifest files are processed by Terraform to create or manage infrastructure resources in AWS.
- Terraform acquires a lock on the state file to prevent simultaneous changes, using AWS DynamoDB for state locking.
- AWS API is used by Terraform to interact with various AWS services like S3, EC2, RDS, and MQ.
- Terraform state file (.tfstate) is stored in AWS S3 as a remote backend, ensuring centralized state management.
- When the lock is acquired (â ), changes are applied safely to AWS resources.
- If the lock isnât acquired (â), the state file access is denied, preventing conflicting infrastructure updates.
Using AWS S3 and DynamoDB Locking Setup
For users managing Terraform state on AWS, a common setup is to use an S3 bucket for state storage and DynamoDB for state locking. When a user runs terraform apply, Terraform checks DynamoDB to see if a lock exists. If no lock is found, Terraform creates one, which prevents others from running Terraform simultaneously and modifying the state file. Once the operation completes, Terraform deletes the lock from DynamoDB, allowing others to access the state file.
Hereâs how you can set this up:
- S3 Configuration: Create an S3 bucket to store your .tfstate files. Ensure that the bucket has proper access policies so that only authorized users can read or write state files.
- DynamoDB for Locking: Set up a DynamoDB table to store the lock. DynamoDB will act as a lock manager, ensuring that only one Terraform process can modify the state at a time. You simply define the table in your Terraform configuration, and it will automatically manage the lock. Hereâs the Terraform configuration to set up DynamoDB:
- S3 as Remote Backend: Configure the backend of the Terraform as S3 so that it stores and updates the state file there.
Now that S3 and DynamoDB are configured in the Terraform, initialize the file and download the dependencies like AWS provider with terraform init
and finally create the resources with terraform apply -lock=false
. Remember to use the -lock=false
first time while applying since DynamoDB has not been created yet to verify the availability of the lock.
You can verify that the state locking has been enabled executing two apply commands on different terminals in parallel. As one gets the lock, the other one would fail to execute.
With S3 and DynamoDB working together, you gain reliable state storage and locking, making it suitable for production environments where consistency is key.
Terraform Cloud Locking
In the open-source or business-source version of Terraform (CLI), setting up state locking requires a bit of manual work. You need to configure services like AWS S3 for state storage and DynamoDB for state locking, which can involve several steps to ensure that the state is locked and protected from concurrent changes.
However, when using Terraform Cloud (the enterprise version), the setup becomes much easier. Terraform Cloud comes with built-in state storage and automatic state locking. This means that you donât have to manually configure separate services like S3 with DynamoDB. With Terraform Cloud, state lock is enabled by default, allowing teams to focus more on infrastructure management rather than spending time configuring state-related services. This is good for teams that want a fully managed solution without the complexity of manual setup.
Now that we have state locking in place, let's explore what versioning is and how to implement it for your Terraform state file.
Use Case of Versioning Terraform State Files
State versioning is another practice for managing Terraform state files. It provides a historical view of state changes, enabling rollback, auditing, and recovery from misconfigurations. Imagine you apply a configuration change that accidentally deletes critical resources. With versioning enabled, you can roll back to a previous state, recovering your infrastructure without spending hours troubleshooting logs or re-deploying everything manually.
For users on AWS S3 or Google Cloud Storage (GCS), enabling versioning on the storage bucket ensures that each change to the.tfstate file is recorded. This version history allows teams to investigate and go back to previous version if something goes wrong, improving both security and traceability.
Configuring Versioning for State Backends
To enable versioning in your AWS S3 bucket:
- Navigate to the S3 Console and select the bucket used for storing state files.
- Go to Properties, find the Bucket Versioning option, and enable it.
You can list versions of the state file and then download a specific version if needed.
If you want to temporarily apply a specific version, download that version locally and point Terraform to it by providing the path to that state file in -backend-config
parameter.
To permanently roll back, you can copy the specific version back to S3 as the current state:
After versioning is enabled, youâll have a complete history of all changes made to your state file. You can use Terraformâs state commands to list, view, and manage these versions, ensuring a robust backup in case of accidental deletions or misconfigurations.
Now that weâve implemented state locking and versioning, you must have realized that managing infrastructure-as-code (IaC) is difficult, especially in large teams working with evolving resources. One common issue is tracking the full inventory of resources and keeping tabs on deleted or modified resources. In such a dynamic environment, it's easy for important assets to be overlooked or even deleted without a clear view of historical changes. This lack of visibility could lead to confusion, security risks, and recovery challenges, making it hard to maintain an accurate picture of your infrastructure.
This is where Firefly steps in. Firefly is a tool that addresses these very issues by providing a platform for tracking resources and their versions over time. It not only keeps a detailed record of all resources but also tracks the deletion of resources, making it easier for teams to stay on top of infrastructure changes and mitigate risks.
Introduction to Firefly: Managing your Cloud Assets
Firefly provides a user-friendly dashboard where teams can view their entire infrastructure inventory, including historical state versions and deleted resources. This allows users to quickly identify what has changed, what has been deleted, and easily revert to previous states if needed. The platform gives you a clear, organized overview of your infrastructure, helping avoid the confusion that often comes with manual tracking.
In short, Firefly provides an extra layer of visibility and control that Terraform alone does not offer. Request a demo to see Firefly at work.
â