Terraform state file keeps track of all resources that were deployed using Terraform within your cloud infrastructure. When you run commands like terraform apply or terraform destroy, the state file is updated according to the changes made within your infrastructure through Terraform.

However, if a DevOps engineer makes a change, like adding a rule to a security group through the AWS console, the state file now doesn’t match the current infrastructure. This mismatch creates a configuration drift. 

Keeping the state file up to date is important because Terraform relies on it for the desired infrastructure. If the state file isn’t up to date, Terraform might override cloud configuration, leading to platform downtime.

Here’s a look at what Terraform drift is, what causes it, and why it's important to find it early. 

What Causes a Drift?

Before we explore how to detect, manage, and fix Terraform drift, let’s take a look at some of the common reasons why drifts occur:

  • Using cloud console: When you create, update, or delete resources using the cloud console instead of your Infrastructure as Code(IaC) code, it causes drift within your infrastructure. For example, if a team member changes the instance type of an EC2 server through the AWS console but doesn’t update the Terraform configuration with the exact change, there will be a drift between what Terraform expects and what’s actually in the cloud. Because of this drift, Terraform will now try to revert to the old setup during the next Terraform deployment, which will make it difficult to manage or scale resources.
  • Using scripts and tools: Cloud configuration drift can occur using tools and scripts like AWS CLI, PowerShell, Pulumi, or Ansible. These tools don’t track the current state of your resources. For example, if someone uses AWS CLI to change the tags on an S3 bucket, that change won’t be reflected in the Terraform state file, which creates a drift within your infrastructure.

Why Detecting Drift is Important for Effective Cloud Infrastructure Management with Terraform

Detecting drift is important for maintaining a stable and secure infrastructure. Without it, you could face a spike in unnecessary costs, compliance violations, or even deployment failures. 

Regularly checking your infrastructure for drift ensures that your current configuration aligns with your desired configuration in your infrastructure. This helps you avoid any unnecessary resource misconfiguration or cost. For example, if a team member changes an EC2 instance with an instance type t2.micro to t2.large for testing through the AWS console and forgets to revert the change, this change won’t be tracked by Terraform. As a result, the team may end up paying for extra resources that aren’t necessary. 

Detecting drift is also especially important in a production environment to ensure that your infrastructure stays compliant, helping them avoid penalties, lawsuits, and costs. Many companies have strict rules and practices regarding setting up their infrastructure. For example, if the security group settings are changed to public access for an S3 bucket using the console containing sensitive customer data. It violates compliance rules, such as GDPR or HIPAA, leading to fine penalties.

If left unnoticed, configuration drift may cause application errors or downtime during deployment, leading to misconfigured infrastructure or data loss. For example, Terraform state specifies a database with version 1.30. If someone updates the version directly through the cloud console, the next deployment might fail due to a version mismatch or breaking changes in the upgrade. This can lead to application errors, such as user downtime or data corruption due to version incompatibility.

How to Detect Drift in Terraform

Now that we know why detecting drift is important, let’s see how to detect drift within your Terraform configuration using two main commands:

Terraform plan

The terraform plan command checks your current cloud configuration against your desired Terraform code. It identifies the drift in the configuration and displays the change as the output.

For example, imagine you have a Terraform configuration that sets up an EC2 instance with a specific instance type, like t2.micro. If a team member changes that instance to t2.large in the AWS console, running terraform plan will show this difference. It will show that the current configuration has a t2.large instance, while your Terraform code still specifies t2.micro.

By running terraform plan, you can easily spot this change and decide how to handle it. This helps in making sure that your infrastructure matches your code.

Terraform refresh

The terraform refresh command updates your desired state to match the infrastructure's current state. Running this command makes sure that the state file has the latest resource configuration of your infrastructure.

For example, if you have a Terraform configuration for an S3 bucket, and another developer changes the bucket's permissions to ‘public’ through the AWS console, your local state will still show it as ‘private’. If you don’t run terraform refresh, Terraform won’t have the correct information about the bucket's current state.

By running terraform refresh, Terraform checks the actual state of the S3 bucket and updates your local state to reflect the new access config. However, it’s important to note that this command does not update your Terraform code. You will still need to modify your Terraform configuration to match the new settings, making sure that both your code and state are in sync. This way, you can avoid resource misconfiguration and errors.

Using these commands regularly, you get early feedback, catch drift in the initial stages, and keep your infrastructure aligned within your Terraform configuration.

How Do You Fix Drift in Terraform?

Below, we’ll demonstrate how a small change through the AWS console can create a configuration drift within your infrastructure — and how we can resolve that drift, step by step.

Firstly, let’s create a main.tf for creating a security group through Terraform:

resource "aws_security_group" "firefly_sg_02" { name = "firefly-sg-02" description = "Firefly Security Group 02" ingress { from_port = 80 to_port = 80 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } }

This code will create a security group that allows incoming traffic on port 80 (HTTP).

Step 1: Apply the Configuration 

Run terraform init and terraform apply commands to initialize and deploy the security group resource to your AWS cloud:

Step 2: Introduce a Drift 

Now, let’s say someone adds an inbound rule to allow SSH (TCP on port 22) through the AWS console. This creates a configuration drift between the actual infrastructure running in the cloud and what is defined within your Terraform code.

Step 3: Identify the Drift 

To find out this drift in your infrastructure, simply run terraform plan

This will compare your current cloud state with the Terraform state. It will show you that the security group now has an additional rule for SSH that you didn't define in your Terraform setup, which will be removed when you apply your Terraform code.

Once a configuration drift is detected, you usually have two options: either keep the changes made in the infrastructure or revert it back to match your Terraform code. Let’s look at both the scenarios and how to manage them using Terraform.

Scenario 1: What To Do If You Want To Keep The Changes

If you want to keep the change (the new SSH rule), you need to update your Terraform configuration files to include the change and keep your code in sync with the actual infrastructure. While terraform refresh will update the state file, you have to add the new rule to your Terraform code yourself to make sure everything matches so that the security group doesn’t get updated.

Now, modify your Terraform code to include the new rule:

ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["0.0.0.0/0"] }

Run terraform refresh to update your local state file to match the current configuration in the cloud. It will record the changes made directly through the console, like the additional rule we added, but note that it won’t change your Terraform code. 

Finally, run terraform apply to update the configuration, making sure that Terraform reflects the changes you want to keep.

Scenario 2: What To Do If You Want To Remove The Changes

If you want to remove the unauthorized rule (the new SSH rule), just run terraform apply to override the manual changes. This will delete the changes that were made through the console and weren’t tracked by Terraform. You can check it in your AWS console to confirm that the inbound rule has been removed.

We have finally fixed the drift, but it required a lot of manual effort. We needed to check for any drift first, and if we decided to keep the change, we had to update the Terraform code ourselves to reflect those changes within the infrastructure. Also, you need to frequently remind yourself to run the plan command to check if a drift has occurred.

Using Firefly to Proactively Monitor and Fix Drifts

Firefly acts as one of the most effective Terraform drift detection tools on the market: automatically detecting changes or misconfigurations within your complete cloud infrastructure. This makes it much simpler to keep your Terraform state and cloud infrastructure aligned. 

With Firefly, you can easily monitor drift and revert to previous configurations if required, and codify the drifted code if you want to implement those changes. Also, you can set up alerts for drift detection via Slack or email, making sure that you're always informed about any changes made to the infrastructure and get early feedback.

Here’s how you can make the most of Firefly for drift detection:

Firefly Dashboard for Monitoring

The Firefly dashboard provides a one stop solution to an easy and clear view of your infrastructure's current state for cloud providers such as GCP, AWS, and Azure, making it much simpler to track changes or detect any configuration drift. You no longer need to run terraform plan to check for drifts; the dashboard gives you all the insights. It shows the percentage of unmanaged resources, resources that have drifted, and those that have been codified.

In the Firefly dashboard, you can see that 2.22% of the resources have drifted. This gives you a clear and quick overview of where your infrastructure has deviated from your original code. The dashboard also shows you resources that are unmanaged or codified, giving you a much more descriptive picture of your infrastructure.

Instead of running commands such as terraform plan to check for changes, you can simply monitor the Firefly dashboard, which helps you spot any configuration drift. Whether it's seeing newly drifted resources or reviewing codified changes, the dashboard makes drift detection and management much easier.

Firefly Drift Details

Firefly provides detailed information about any configuration drift within your infrastructure, helping you see exactly what changes have been made as it displays the current configuration and how it differs from the desired Terraform configuration of your infrastructure. This gives you a quick and easy way to understand what has changed, such as updated security group config, without digging through complex logs or manual checks.

This detailed view makes it simple to decide whether to accept the changes or revert them, simplifying the process of managing configuration drift in your infrastructure.

Firefly’s “Codify” Button

When you want to implement drift changes to fix the drift, you can use Firefly’s ‘codify’ feature. This feature helps you automatically generate the Terraform code for the drifted current configuration, ensuring your desired state aligns with it.

For example, if a new ingress rule is added, Firefly will codify the change into your security group configuration. You can then review the generated code and run terraform apply to implement those changes, making the whole process much less time-consuming and error-free.

Firefly Alerts

With Firefly, you can easily set up Slack alerts to notify your team whenever drift is detected within your infrastructure. This gives early feedback on any infrastructure changes, keeping everyone on the team informed, and quick action can be taken to resolve it before it causes any security or infrastructure problems.

Here’s how a step-by-step guide to setting up Slack alerts in Firefly:

Navigate to ‘Notifications’ in the Firefly dashboard.

Click on ‘+ Add New’ to create a new alert.

From the ‘Event Type’ dropdown, select the specific event you want to be notified about, such as drift detection.

Under ‘Criteria’, choose the relevant data source, such as AWS or any other integrated platform.

Select your notification ‘Destination’ (Slack or email) and click ‘Create’.

Now, you’ll receive notifications every time a drift is detected, keeping you informed and ready to take action immediately for these drifts.

With Firefly, you get a detailed dashboard, automated drift detection, codified changes, and Slack alerts, which help you maintain a consistent cloud environment.

It makes sure that your resources stay aligned with your Terraform code, saving time and reducing the risk of errors, while keeping your team informed and ready to act on those drifts.

(If Firefly’s drift management capabilities sound interesting to you, and you’d like to Try Firefly yourself, explore our solution freely here.)