Keeping your cloud infrastructure secure and stable is important, but as it grows, it becomes more unmanageable. You might change an EC2 instance using the AWS CLI to test out your solution without automating the changes. Due to even small changes like these over time, your current cloud setup can ultimately drift away from what’s defined in your IaC code: creating security risks and unexpected issues that might affect your infrastructure.
In this blog, we’ll discuss configuration drift and identify its root causes. Then, we’ll explore why it’s important to spot it early, and how you can prevent and remediate configuration drifts in your infrastructure.
What is Cloud Configuration Drift?
Cloud Configuration Drift, also referred to as cloud drift, happens when the actual configuration of your cloud resources no longer matches the defined configuration in your IaC config, such as .tf files while using Terraform. For example, the security group settings of an EC2 instance are changed to open additional ports using AWS UI or console, which creates a drift. This kind of change could expose your infrastructure to security risks by allowing unauthorized access.
And since it is not a change made using IaC, you can’t track it.
📹 Watch for an engineer's point of view on why drift happens, and how to remediate ↓
Cloud infrastructure drift can happen when you are adjusting an S3 bucket's access policy or changing instance types to meet a testing need directly using the AWS console, even though doing so seems like a great way to save time.
Eventually, these small changes can build up, making it harder to manage your infrastructure. And unless you monitor, detect, and address these drifts regularly, your automation will start to fail and become inconsistent.
What Causes Configuration Drift?
Configuration drift often occurs due to changes made to fix urgent issues or temporary adjustments through UI or CLI that aren’t reflected in the IaC configuration.
Let’s explore some of the main reasons why this happens:
Manual Changes
One of the most common causes of configuration drift is making changes directly through the cloud provider’s interface. These changes often happen when there's an urgent need, such as resolving business application downtime, quickly fixing an ongoing incident like a sudden spike in traffic, or applying an update to address a newly discovered security vulnerability. This might involve adjusting your infrastructure, such as increasing the storage capacity of an EC2 instance using the AWS Console. However, your infrastructure drifts apart if you don’t update your IaC code to align with these changes.
This means that the actual state of your EC2 instance no longer matches what’s defined in your IaC config files, such as those created with Terraform or Pulumi. (For example, if you initially defined the instance type as t2.micro
in your Terraform code, but someone changes it to t4.micro
via the AWS UI, this creates a configuration drift.)
Additionally, if someone on your team made these changes without informing the rest of the team or updating the IaC config, you might not even know that the drift occurred.
To understand how configuration drift can occur, let’s consider an example using Terraform. For example, changing an EC2 instance’s storage size through AWS UI creates a mismatch between your cloud infrastructure and the configuration defined in your Terraform code, resulting in a configuration drift. Let us define an EC2 instance:
We’ve defined an EC2 instance with a root block device that has a storage size of 10 GB. Let’s say your teammate logs into the AWS Management Console and notices that the 10 GB of storage isn’t sufficient for the workload running on this EC2 instance and increases the volume size to 20 GB directly from the console without updating the Terraform code, as shown below:
The EC2 instance now has a 20 GB volume, but the Terraform code still defines it as 10 GB, creating a configuration drift. We can perform a Terraform drift detection by running terraform plan
. This command checks for differences between the state of your infrastructure and what’s defined in your Terraform code as shown below:
The output from terraform plan
clearly shows a difference between the current state with 20 GB of storage and the Terraform code, which specifies 10 GB.
You can now decide what to do next:
- Update the Terraform code: Adjust the Terraform code to reflect the new 20 GB volume size so your Terraform code and infrastructure match.
- Revert the change: If the change for 20 GB volume was done by mistake, run
terraform apply
to change it back to 10 GB within your infrastructure. This will make sure that your infrastructure matches with your Terraform code.
Using Script
Configuration drift doesn’t just happen due to changes via the cloud’s UI; it can also occur when using tools and scripts like PowerShell, AWS CLI, or Ansible because these tools don’t maintain state. Unlike IaC tools like Terraform or Pulumi, which track the state of your infrastructure, these other tools execute commands directly on the infrastructure.
For example, you might use AWS CLI to update an S3 bucket’s settings or adjust security policies. It will create a drift if the changes made aren’t updated in your IaC config. Suppose you have an S3 bucket that was created and managed by Terraform. Initially, everything is in sync, and your terraform state file accurately reflects your infrastructure. However, someone within your team uses the AWS CLI to update the tags on this bucket instead of modifying the Terraform code by running the given command:
The S3 bucket now has tags that were added using the AWS CLI. However, these tags are not recorded in the Terraform state file, meaning the actual state of the bucket in AWS no longer matches what’s defined in your IaC config.
To identify the drift caused by using the AWS CLI, you would run a terraform plan
. For the S3 bucket, the terraform plan
will display where the current setup doesn’t match the Terraform code with an update, as shown below:
The output shows that the tags on the S3 bucket have been modified. Terraform detects that the actual tags on the bucket don’t match what’s specified in the Terraform configuration.
To resolve this drift, you have two main options:
- Update the Terraform code: You can update the Terraform configuration to include the new tags. This ensures that your IaC configs are in sync with the actual state of the bucket, preventing future drifts.
- Revert the changes: If the tags were added for testing or are not needed, you could remove them using the AWS CLI or by running
terraform apply
, which would revert the tags to your original Terraform configuration.
Why is it important to identify configuration drift?
Drift can create mismatches between your infrastructure and IaC configs, causing unexpected errors or downtime. It can also result in your systems falling out of compliance with regulations, leading to potential fines and security risks. Additionally, drift can cause automated deployments to fail and increase costs due to misconfigured or over-provisioned resources.
Let’s break down these challenges to understand why detecting and addressing drift is important:
- You risk inconsistent IaC
Configuration drift happens when changes to your cloud infrastructure aren’t recorded in your IaC config. This can lead to deployment failures when you try to manage or update your infrastructure.
For example, you’ve set up an EC2 instance using Terraform, and the security group settings are defined in your IaC. Later, someone from your team changes the security group rules in the AWS UI to allow access from a different IP address. If this change isn’t updated in the code, your IaC will be out of sync with the actual setup.
The next time you deploy your infrastructure, Terraform will reset the security group rules to match what’s in the IaC config, removing the added IP address through UI, and leading to access issues for users who rely on that IP address. This might cause business loss.
When your IaC doesn’t align with your actual infrastructure, the code no longer represents the real state of your resources, making it harder to track. For example, if there’s a problem with access or configuration, you might look at your IaC config expecting it to show the current setup, but it doesn’t. This mismatch can lead to delays in identifying and fixing the problem, as what’s in the code doesn’t reflect the actual environment.
- You may have to deal with compliance violations
Many industries have strict guidelines on how infrastructure should be configured to ensure security, privacy, and reliability. When your actual infrastructure doesn't match what's written in your IaC, it can lead to non-compliance, which may result in fines, penalties, or legal trouble.
For example, your organization needs to follow regulations like GDPR or HIPAA, which require all data stored in S3 buckets, such as customer information or financial records, to be encrypted. Your IaC is set up to ensure that encryption is always turned on. But if someone disables encryption on a bucket through the AWS UI and this change isn’t updated in your IaC, the bucket is now out of compliance. This drift might not be noticed until an audit is done for the deployed resources, at which point your company could face fines or other penalties for not following the required standards.
Drift can also make it hard to pass security audits. Auditors will check if your actual setup matches your documented configuration. If they find differences, it could lead to deeper scrutiny and bigger problems.
- It can cause deployment failures
Configuration drift can cause deployment failures by creating differences between your actual infrastructure and IaC config. You can manage your infrastructure deployments using the CI/CD pipelines. When drift happens, your deployment might fail due to duplicate resources or limits being exceeded.
For example, you’ve set up a CI/CD pipeline that applies the Terraform code to configure your environment with roles and users. If one of your team members created the ‘cost-optimizer’ role using the AWS Console to test accesses but doesn’t update the Terraform code. When you deploy your code with the same role, your deployment fails due to the role duplication. It causes delays as you rush to fix the issue by importing the resource or removing the existing role. Regularly checking for drift and keeping your IaC updated is important to avoid these disruptions.
- It could lead to an increase in unnecessary costs
Configuration drift can lead to higher costs by making your cloud resources either misconfigured or over-provisioned. For example, if someone manually upgrades an EC2 instance from a t2.micro to an m5.large to handle a temporary surge in traffic, and this change isn’t updated in your IaC, you could end up paying for more resources than needed.
This drift can increase your cloud bills, as unused resources continue to run. In a cloud environment like AWS or GCP where costs are based on usage, these unnoticed changes can lead to unnecessary expenses. Keeping your infrastructure aligned with your IaC helps you ensure that you’re only paying for what you need.
How to Prevent Configuration Drift?
To keep your infrastructure aligned with your IaC, it’s important to prevent configuration drift, which can cause unexpected errors, deployment failures, or security vulnerabilities.
But without proper governance over changes made to your cloud configuration, it’s near impossible. Detecting this drift problem isn’t easy, and remediation is, for some, not happening at all.
Firefly’s 2024 State of Infrastructure as Code report shows that:
- Over the last year, many cloud practitioners have adopted dedicated tools to detect configuration drift
- Still, 20% of survey respondents report that they can’t detect drift. And most others aren’t able to do so until their organization has been exposed to (and vulnerable as a result of) unauthorized changes for days or even weeks.
Want to ensure you stay on top of your drift management? Let’s look at steps you can take to keep your infrastructure in sync with your IaC:
Tip: Use IaC Consistently
One of the most effective ways to prevent configuration drift is to ensure that all infrastructure changes are made through IaC tools like Terraform rather than the cloud provider’s UI. This practice helps keep your infrastructure consistent and avoids the risks associated with untracked changes.
For example, you manage an RDS database instance using Terraform, defining its configurations, including storage size, instance class, and backup settings. One day, a team member realizes that the backup retention period needs to be extended to meet a sudden compliance request. To solve the problem, the team member logs into the AWS Management Console and changes the backup retention period through the UI.
This adjustment resolves the issue temporarily but introduces configuration drift. Now, the RDS instance’s backup settings don’t match what’s defined in the Terraform code. If you later run terraform apply
, Terraform will update the backup settings to the original configuration, reducing the retention period and risking data loss.
This situation shows why it’s important to make all changes through IaC tools like Terraform. When you consistently use IaC:
- All changes are tracked and documented: Every change you make is reflected in the code, making it easier to review and understand what’s been done.
- You avoid unexpected reversions: Since all changes are applied through Terraform, you won’t accidentally overwrite adjustments done through AWS UI, reducing the risk of downtime or other issues.
- Team collaboration is smoother: Everyone on the team is working from the same set of configurations, ensuring consistency and reducing the chance of miscommunication.
Tip: Conduct Regular Audits
Regular audits are important for keeping your infrastructure in check and preventing configuration drift from going unnoticed. By routinely comparing your current setup with your IaC config - you can easily identify and fix any drifts.
Let’s say you manage a cloud environment with various resources like EC2 instances, S3 buckets, WAF rules, and RDS databases. Over time, small changes might be made, sometimes without your knowledge. For example, there was an incident with some malicious API attacks, and you created WAF rules using the AWS console to prevent them. This change can create drift, where your actual infrastructure no longer matches what’s defined in your IaC.
To stay ahead of these issues, you perform regular audits. For example, you might manually see if your infrastructure matches what’s in your IaC config as you passed certain tags in your Terraform code. During one of these audits, you might find that an EC2 instance has a security group setting that doesn’t match your IaC config, causing unexpected URL blocks or access issues with the applications.
Catching this drift early allows you to either update your Terraform code to match the current state or revert the change to align with your original plan. Regular audits like this help keep your infrastructure consistent, minimize risks, and ensure everything runs smoothly. With consistent audits, you can:
- Spot Unapproved Changes: Audits help you catch changes made outside of your IaC process before they lead to bigger issues.
- Stay Compliant: Regularly verifying your setup against standards ensures you’re meeting regulatory requirements.
- Avoid Surprises: By detecting drift early, you can fix problems before they cause downtime, security gaps, or unexpected costs.
Tip: Use CI/CD Pipelines
Regularly running the CI/CD pipeline is another way to prevent configuration drift. By integrating checks for drift into your CI/CD process, such as running terraform plan
frequently, you can catch inconsistencies between your infrastructure and IaC early and frequently.
For example, you can set up your CI/CD pipeline to automatically run terraform plan
at regular intervals with each commit made towards the infrastructure deployment. This will help identify any drift by comparing the current state of your infrastructure with the IaC configuration.
If any drift is detected, the pipeline can alert you and fail, allowing you to address the drift before it affects your deployment. This practice ensures your infrastructure remains in sync with your IaC, reducing the risk of errors, downtime, or security issues caused by drift.
Tip: Lean on Proactive Alerts and Monitoring
Using monitoring and alerting systems is an effective way to keep track of your infrastructure. These tools help you spot unplanned or unauthorized changes so you can address them quickly.
For example, if you’re managing a cloud setup with multiple team members making changes, you can use tools like AWS Config or CloudWatch to monitor resources like EC2 instances and S3 buckets.
If someone changes the security settings on an S3 bucket through AWS UI, the monitoring system will send you an alert, allowing you to quickly detect and decide whether to approve or reverse it. Let’s discuss the importance of alerts and monitoring:
- Instant alerts: You find out about changes as soon as they happen.
- Quick fixes: You can quickly deal with any changes that weren’t planned.
- Consistency: Regular monitoring helps keep your infrastructure consistent and aligned with your IaC.
Using Firefly for Drift Detection (and Correction)
We know that drift happens. Concerningly, however, drift remediation often doesn’t.
Data from our 2024 State of IaC report shows that when it comes to drift remediation, fewer than half can implement a fix within 24 hours. Even more worryingly, 13% do not fix the issue at all.
That’s never the case with Firefly.
Firefly helps you prevent configuration drift by automatically detecting drifts and misconfigurations, making it easier to keep your cloud environment consistent. With Firefly, you can monitor drifts, view change history, and roll back to previous settings if needed.
Let’s explore how Firefly can assist with Drift Detection:
Monitor Drift in Firefly's Dashboard
Firefly's centralized dashboard provides a clear and comprehensive view of your infrastructure, allowing you to monitor for configuration drift. As shown in the screenshot, you can see detailed insights into your cloud resources, such as the percentage of unmanaged assets, the occurrence of drift, and the status of various IaC stacks. The visual representation makes it easy to spot where drift has occurred and how your infrastructure is aligned with your IaC.
You can quickly identify issues like drift, unmanaged resources, or potential cost savings and take immediate action to address them. This dashboard not only shows the current state but also tracks changes over time, helping you maintain a consistent and secure cloud environment.
By clicking on the drifted data source, you can see exactly what has changed compared to your IaC. This provides a comprehensive breakdown of the drift, highlighting differences in properties, tags, and other key configurations. You can either codify these changes back into your IaC or revert the resource to match the original configuration, ensuring that your infrastructure remains consistent and secure.
With options like ‘Drift Details’, ‘Codify’, and ‘Migrate’, you can explore the specific changes that caused the drift, update your IaC to match the current state or revert the resource to its original configuration. This detailed view in Firefly allows you to quickly take precise actions, ensuring your infrastructure remains consistent with your IaC and functions smoothly without any deployment failures.
Codify your Drift
Firefly allows you to automatically generate codified versions of your infrastructure, including unmanaged resources. As shown in the image below, Firefly presents a detailed view of the infrastructure's code, highlighting key attributes such as instance type, storage settings, and security groups. This codified view makes it easy to incorporate any unmanaged resources back into your IaC setup.
You can directly export this codified configuration, create pull requests, or integrate it into your existing IaC tools like Terraform, Pulumi, or Ansible. This feature helps ensure that your infrastructure is always aligned with your desired state, reducing the risk of drift and making it easier to manage and scale your cloud environment.
Stay Informed with Alerts
Firefly makes it easy to stay informed about any configuration drift by sending alerts directly to your Slack or email. This ensures you're immediately aware of any changes, allowing you to quickly address issues and keep your infrastructure stable. Let’s look at how you can set up a new notification in Firefly:
- Navigate to ‘Notifications’.
- Click on ‘+ Add New’.
- From the ‘Event Type’ dropdown, select the event you want to be notified about.
- Under ‘Criteria’, choose the relevant data source.
- Select your notification ‘Destination’ (Slack or email) and click ‘Create’.
With these steps, you’ll receive timely alerts that help you maintain control over your cloud environment with quick feedback.
Firefly is a tool that simplifies cloud management by helping you detect and correct configuration drift. With features like monitoring, detailed codification, and instant alerts, you can keep your infrastructure aligned with your IaC. It makes it easy to manage your cloud environment, preventing costly mistakes using a single platform. By using Firefly, you can streamline your infrastructure and keep it running smoothly.