Whenever a DevOps engineer uses a cloud provider's (like AWS, Azure, GCP) console or UI to update or configure specific resource in the infrastructure and forgets to synchronize these changes back into their Infrastructure-as-Code (IaC) like Terraform files, then there is a drift between the actual state and the desired state of cloud resources in the Infrastructure.

As part of this blog, we will learn what Terraform drift is and what causes it. We would understand the consequences of these drifts and why prevention is important. By the end of this blog, you will be aware of the tools you can use to avoid Terraform drift and implement the best practices. Additionally, you'll learn how to apply Firefly as a configuration management tool to monitor and detect the Terraform drift in your configuration and see how it would make your life easier as an Infrastructure Consultant.

Introduction

Terraform works on the fundamentals of the desired state and the actual state. Let us understand what they are:

  • Your Terraform configuration files define the desired state, which includes all the resources, configurations, and dependencies defined in the .tf files.
  • The actual state is your infrastructure as it exists in the cloud provider at any given point in time, including all the resources, configurations, and current settings.

For example, if you create an EKS cluster using Terraform (desired state) and make manual changes in the cloud provider, such as decreasing the size of the nodes, your deployment can fail due to insufficient space or resources for virtual machines. An infrastructure drift will occur when the actual state differs from the desired state, causing Terraform Drift.

Difference between the Actual State and the Desired State

What causes the Terraform drift, and why is it problematic?

We saw that Terraform drift results from the difference between the actual state and the desired state. Let us take a look at what causes it:

  • Manual Configuration Changes - If you make changes to your Terraform configuration management using the cloud provider’s UI or CLI to debug, scale, or for any other reason, there’s a Terraform drift. For example, an S3 bucket's visibility is directly modified from private to public in the AWS Management Console, exposing the bucket data.
  • Automated Configuration Changes - Terraform drift is created when changes are made through automation scripts like shell scripts, Pulumi, or Python rather than the Terraform configuration. For example, a separate CI/CD pipeline that updates resources independently of Terraform might expose your application over the Internet due to changes in the security group to everyone without any firewall.
  • External Integrations - If you have third-party services or tools like Ansible that cause the changes in the Terraform configuration that lead to the Terraform drift. For example, your ansible might upgrade the dependencies to versions with vulnerabilities and security risks.
  • State File Manipulation - Any direct edits or corruption of the Terraform state file (Terraform.tfstate) brings a Terraform drift in your configuration. For example, manually editing the state file to reflect changes in your VPC logs or CloudTrial logs configuration that wasn’t applied through Terraform might make your code inconsistent, and you might lose your logs, making it hard to debug.
  • API Changes - If the cloud provider changes the APIs, there is a drift, which may cause resource attributes to change. For example, AWS updates the default settings for a service without user intervention, which might break your production deployments.

How does drift detection work?

When your cloud infrastructure doesn't match your Terraform configuration, it’s called drift. Detecting these differences is known as drift detection.

Let us now take a look at some key points of how Terraform drift detection works:

  1. You have your Infrastructure defined in your Terraform (.tf) files. Run the Terraform apply command to create or update your infrastructure and match the desired state which is stored in the Terraform state file (Terraform.tfstate).
  2. Your Terraform.tfstate file is a local or remote file that stores the actual state of your infrastructure as known by Terraform.
  3. You can optionally run the Terraform refresh command to update your state file with the actual state of the infrastructure on the cloud provider.
  4. The Terraform plan command identifies the drift (difference) issues between the actual state and the desired state of your infrastructure, indicating which attribute of the configuration differs or is changed. For example -
~ aws_s3_bucket.example_bucket acl: “private” => “public-read”

Here, the ~ symbol indicates that the ACL configuration of the s3 bucket resource has drifted from private to public-read and needs modification to match the desired state.

  1. You can correct the drift by running the Terraform apply command. This would update the Terraform configuration to the desired state, and these changes would be updated in your resources. 

Real World Example

You have defined a security group in your Terraform configuration to restrict access to your web server. The resource creation Terraform code in your main.tf  will be:

provider ”aws” { region = ”us-east-1” } resource ”aws_security_group” ”web_sg” { provider = aws.account_a name = ”web_sg” description = ”Allow inbound HTTP and HTTPS traffic” ingress { from_port = 80 to_port = 80 protocol = ”tcp” cidr_blocks = [”192.168.1.0/24”] } egress { from_port = 0 to_port = 0 protocol = “-1” cidr_blocks = [“0.0.0.0/0”] } }

In this current configuration only, the security group allows HTTP (port 80) traffic only from the IP range 192.168.1.0/24 and all outbound traffic.

Now, if you update the security group in the AWS Management Console to allow HTTP traffic from any IP address (0.0.0.0/0), it introduces a potential security risk.

This creates a difference in the Terraform actual state and the desired state, causing a Terraform drift.

Run the Terraform plan command in your terminal to compare the actual state with the desired one.

Here, you can see that due to the drift, the Terraform plan will remove the actual state changes done from the console and replace them with the one from the desired state in the Terraform files. When you run Terraform apply, the drift will be corrected, ensuring the infrastructure remains secure and consistent with other resources. The output is:

What is the importance of drift detection, and why minimize drift?

Managing drift makes it easier for you to detect changes in the actual state and the desired state of your Terraform configuration. You should practice drift detection for the following reasons:

  • It maintains consistency between your infrastructure on a cloud provider and your Terraform code. For example, drift is detected and corrected if the Terraform code's S3 bucket configuration does not match the configuration on the cloud.
  • It ensures that your infrastructure is secure from vulnerabilities and that any change from unauthorized intervention can be detected, preventing any change in IAM policies or security group rules. For example, if someone opens the VPC to all the traffic from the UI, it can be detected and corrected using drift detection to prevent security vulnerabilities.
  • You get a clearer audit trail of infrastructure changes, helping you debug and troubleshoot. For example, when you run the Terraform plan to see the differences in the desired state and the actual state, you can see the difference.
  • The operational risks are reduced since your infrastructure is more stable and reliable. For example, you can correct the drift detected due to the changes done in your EKS cluster and have a stable infrastructure.
  • You can automate the Drift detection and correction, allowing you to focus on infrastructure development tasks. For example, integrating with automation tools like Firefly which will take care of the drift detection and correction enabling you to focus on deploying your eks configuration. 

Best practices for Terraform drift detection

Terraform drift detection tools help you identify drift in your Terraform configuration and prevent its consequences. You can further improve it by making use of the following best practices:

  • Make sure to run terraform plan command on frequent occasions to compare your actual state and desired state for any changes. 
  • You must avoid making changes using CLI, console, script, or external automation.
  • Enable detailed auditing or monitoring of your infrastructure to keep track of changes made through Terraform or any other means.
  • Implement policy as code using tools like Open Policy Agent to prevent changes outside of Terraform or compliance issues.
  • Use a drift detection tool like Firefly and automate it in your CI/CD pipeline to detect issues early and prevent deployment issues.

Performing drift detection and manually correcting drift at frequent intervals is time-consuming and inefficient. It would require you to proactively monitor your infrastructure code with 24/7 surveillance. If you miss any of it, it will leave your Terraform configuration in an inconsistent and hard-to-manage state since there is no alerting involved or audits in place to track when and where the changes have been made. You can reduce your manual interventions by alerting and ensuring that your Terraform configuration is in the desired state by automating drift detection and remediation using a tool like Firefly.

Drift detection tools 

It can be an overhead to keep track and manually check for any drift in the Terraform configuration. You can be more efficient and ensure that there is no Terraform drift by making use of Terraform drift detection tools like:

  • Firefly - It is used to continuously monitor your cloud infrastructure, detect, alert, and assist with any drift from your Terraform configurations.
  • Terragrunt - It enhances Terraform by providing a wrapper that helps you manage configurations and detect drift across multiple Terraform modules.
  • Atlantis - It is an application that automates Terraform workflows by detecting drift and applying infrastructure changes via pull requests.
  • Driftctl - It is an open-source tool that detects and reports infrastructure drift by comparing the real-world state of cloud infrastructure resources with Terraform state files.

How does Firefly assist with Terraform drift?

Firefly helps prevent infrastructure drift by automatically detecting drifts and misconfigurations. It also lets you see the history of changes, roll back to previous settings, and recover deleted assets. Here's how Firefly can help with Drift Detection:

  • Continuous Monitoring

With Firefly's dashboard, you can see your entire infrastructure. All detected drifts manual changes are trackable in real-time, showing exactly where discrepancies occur. This ensures your infrastructure stays aligned with your Terraform code, reducing the risk of unexpected changes and misconfigurations.

  • Alerts

Firefly sends alerts on Slack or your email account to inform you of any drift. This ensures that you are immediately aware of any changes, enabling quick action to fix issues and maintain your infrastructure's stability.

Let’s look at how you can create a new notification in Firefly:

  1. Go to Notifications.
  2. Select + Add New.
  1. Choose the event type for which you want to receive notifications from the Event Type dropdown menu.
  1. Under Criteria, select the data source.
  1. Select the Destination (Slack or email) and click Create.
  • Excluded Drifts

Firefly allows you to stop notifications for specific properties of a drifted asset by excluding these properties from drift detection:

  1. Go to Settings > Excluded Drifts.
  2. Use the Search bar to find a specific rule.
  3. Switch the toggle of an existing rule to exclude the asset's properties from Firefly drift detection (stop notifications).
  • Detailed Reports

After integrating your cloud provider account with Firefly, you can see all the configurations stored in your state file on the Firefly dashboard like this:

If you have made any changes to your cloud configuration using your console, they will display under the Drifted section. When you click on Drifted, you can see that your Terraform code does not match the cloud provider infrastructure state.

The Terraform state has drifted for these storage buckets in the GCP project. Clicking on the bucket under asset history will reveal a detailed version of the changes that caused the drift.

Conclusion

Managing infrastructure drift in Terraform is essential for maintaining cloud infrastructure integrity, security, and reliability. Understanding the difference between the actual state and the desired state, along with the causes of drift, is key to effectively managing your Terraform configuration. Implementing practices like running Terraform refresh and Terraform plan commands to detect drift or implementing the Firefly tool to enhance drift detection ensures your cloud infrastructure management, remains secure, efficient, and aligned with your specified network configurations throughout, preventing risks and optimizing resource utilization.

Frequently Asked Questions

Q 1. What is infrastructure drift in Terraform?

Infrastructure drift takes place when the actual state of your infrastructure differs from the current state of the desired state defined in your Terraform configuration files.

Q 2. How can I detect infrastructure built in drift detection?

You can detect the infrastructure drift by running Terraform plan, which compares the actual state (from the state file) with the desired state (from the configuration files) and shows any differences.

Q 3. How does Firefly help with infrastructure drift issues?

Firefly provides automated monitoring for infrastructure drift, ensuring that your cloud infrastructure managed resources remain in sync with your defined policies and configurations.

Q 4. Can Firefly integrate with Terraform?

Yes, Firefly can integrate with Terraform to help manage and govern your infrastructure as code, providing visibility and control over your infrastructure as code in cloud environment.