Why Should Drift Detection Be Part of CI/CD?
Infrastructure drift happens when the actual setup of cloud resources doesn’t match what’s defined in configuration files. This often occurs when a team member makes a quick change through the clickops to fix an urgent issue - like opening a port in a security group because an application isn’t reachable or scaling up an instance because the traffic on the application has spiked suddenly. It can also happen when a script, such as an AWS CLI command or a PowerShell script, modifies your cloud infrastructure - like updating an S3 bucket policy, attaching a new IAM role to an EC2 instance, or changing RDS instance parameters.
Over time, these untracked changes create inconsistencies between the intended configuration and what’s actually running within your infrastructure. This can lead to application deployments failing because resources have been modified outside of the Terraform’s scope.
These changes often go unnoticed until an audit flags them or a deployment fails because Terraform attempts to modify a resource that has already been altered. By then, identifying who made the change and understanding why it was necessary at that time can be a difficult task, especially in larger teams where multiple engineers manage multiple environments within an infrastructure.
Manually verifying drift isn’t a practical solution. Running terraform plan
occasionally or relying on scheduled drift checks won’t catch changes made between scans. Cloud resources are updated throughout the day, whether through direct modifications, external automation, or adjustments triggered by scaling policies.
The most effective way to manage drift is to integrate detection into the CI/CD pipeline. Every time the infrastructure code is updated, it should be validated against the actual cloud state. If there’s a mismatch, it needs to be flagged immediately so that engineers can review and address it before it impacts deployments or security.
To further ensure that drift doesn’t go unnoticed by the team, we can schedule the pipeline to run drift detection automatically every Monday or Friday. This allows teams to detect any unexpected infrastructure changes, even when no new code is pushed, ensuring that manual modifications, external automation, or cloud service updates do not silently introduce risks.
What Are the Root Causes of Drift?
Infrastructure drift doesn’t always happen because someone changes a resource through the cloud’s console or scripts. In many cases, it’s a result of how cloud services operate or how different automation tools interact with infrastructure. Understanding these patterns is an important task to preventing drift before it becomes a problem to the infrastructure.
Asynchronous State Changes
One of the common causes of drift is asynchronous state changes, where AWS services modify their own configurations. Auto Scaling Groups are a good example - Terraform might define a desired instance count, but if the scaling policy increases or decreases the number of instances based on traffic, Terraform will detect this as drift. Similarly, managed services like AWS RDS or ElastiCache may apply automatic updates or maintenance patches, changing configurations without Terraform being aware of it.
IAM Policy Inheritance And External Modifications
Another issue arises from IAM policy inheritance and external modifications. An IAM role might be defined in Terraform with strict permissions, but if another tool or an engineer updates the policy through clickops - such as adding permissions for debugging and forgetting to remove them - Terraform will detect this as an unintended change. Over time, these small adjustments accumulate, making it harder to enforce access control policies consistently.
Conflicting Automation Tools
Drift can also be introduced by conflicting automation tools. Organizations often use multiple infrastructure management tools alongside Terraform, such as other tools such as AWS Lambda functions that create or modify resources dynamically, CloudFormation stacks deployed outside of Terraform’s control, or third-party SaaS platforms that integrate with AWS and modify security groups or networking configurations. If these changes aren’t managed within Terraform, they create conflicts that Terraform later flags as drift.
State Misalignment Due To Missing Remote State Locking
State misalignment due to missing remote state locking is another common issue. When multiple engineers work on infrastructure, using local Terraform state files instead of a shared backend like AWS S3 with state locking can cause discrepancies as well. If one engineer applies changes from an outdated local state while another modifies the same resources, Terraform won’t have an accurate record of the infrastructure’s latest state. This leads to one engineer’s changes overriding the other’s, creating unexpected drift within the infrastructure.
Most teams don’t even realize that a drift has happened until Terraform deploys changes and something fails. If an engineer runs terraform apply
expecting a simple update, but Terraform attempts to revert an “untracked” change, it can disrupt production deployments. Catching drift early, before Terraform tries to modify already-altered resources, is important to maintaining a stable infrastructure.
Drift Detection as Code: Designing a GitHub Actions Workflow
Detecting drift manually isn’t reliable, and running occasional Terraform checks leaves too many gaps. The best way to manage drift is to integrate it into CI/CD so that every infrastructure update is verified against the actual cloud state. This ensures that any unexpected change is detected early, before it impacts deployments or security.
With GitHub Actions, we can automate drift detection by running Terraform checks on every push to main
branch. Additionally, we have scheduled the workflow to run automatically every Monday at midnight UTC, ensuring that drift detection happens weekly, even if no new commits are pushed. This means that even in weeks when no infrastructure changes are committed, we still get insights into any untracked modifications made outside of Terraform’s control. By integrating a scheduled drift check, teams can stay proactive and catch discrepancies before they impact deployments.
This way, any infrastructure change - whether intentional or untracked - is flagged before it causes any issues. The workflow is designed to:
- Run Terraform drift detection on every code change.
- Use GitHub Secrets to authenticate AWS securely.
- Execute
terraform plan -detailed-exitcode
and interpret its output. - Create GitHub Issues automatically when drift is detected.
- Close previous drift reports when the issue has been resolved.
The workflow file is structured as follows:
1. Configuring GitHub Actions to Run Terraform Drift Detection
GitHub Actions runs a terraform plan
to compare the actual state of the infrastructure with what is defined in the code. The workflow is triggered on every push to main, ensuring that drift checks happen automatically without manual intervention.
2. Running Terraform Plan and Capturing the Exit Code
Terraform uses exit codes to indicate whether a drift has been detected:
- Exit code 0 → No changes, everything is in sync.
- Exit code 1 → Terraform encountered an error.
- Exit code 2 → Drift detected. The infrastructure has changed from what is defined in Terraform.
The workflow captures the exit code and determines whether a drift has occurred or not. If drift is detected (exit code 2
), an alert needs to be generated.
3. Publishing Drift Details to GitHub Issues
Instead of just failing the workflow, we want to track drift as an issue so teams can investigate and take action. The workflow automatically creates a GitHub Issue when drift is detected and updates the issue if drift remains unresolved.
4. Closing Drift Reports When Issues Are Resolved
When no drift is detected (exit code 0
), any previously created GitHub Issues should be closed to avoid false alerts. This step makes sure that teams are only notified when drift actually exists.
By integrating drift detection into CI/CD, teams gain continuous visibility into infrastructure changes, reducing the risk of unexpected failures or security issues. Instead of waiting for audits or deployment failures to uncover drift, it becomes part of the normal workflow - ensuring cloud resources always match what’s defined in Terraform.
Running Terraform Drift Detection in GitHub Actions
With the drift detection workflow in place, it’s time to test it with an example. To do this, we will manually introduce drift into an AWS resource and let GitHub Actions detect it. Once the workflow runs, we’ll examine the Terraform output and see how the system identifies and logs the drift.
Since Terraform manages the state of infrastructure, any changes made outside of it - whether intentional or accidental - should be flagged as drift. In this case, we’ll modify an S3 bucket directly using AWS CLI, making Terraform aware of an unexpected change. The goal is to see Terraform’s response, how GitHub Actions logs the issue, and how we can resolve the drift efficiently.
Manually Introducing Drift into an S3 Bucket
The Terraform setup defines a simple S3 bucket in AWS, with the backend configuration stored in an S3 remote state. This ensures Terraform always has an up-to-date record of the infrastructure.
The backend is configured in backend.tf to store the Terraform state remotely:
At this point, Terraform expects the bucket firefly-drift-detect-buck483974387
to have no additional configurations unless explicitly defined in the code.
To introduce drift, we will modify this bucket outside of Terraform using AWS CLI.
This command adds a new tag (DriftTest: UnexpectedChange
) to the bucket, which Terraform is not aware of. This will create a state mismatch - Terraform still believes the bucket has no tags, while AWS now includes one.
Triggering the GitHub Actions Workflow
To detect this drift, we trigger the GitHub Actions workflow by making a dummy commit and pushing it to main
:
Once this commit is pushed, GitHub Actions automatically starts the workflow, running terraform plan
to compare the actual infrastructure state with plan output and what’s defined in your Terraform’s state file.

Analyzing Terraform’s Output
As expected, Terraform detects the unexpected tag and reports a drift. The terraform plan
output highlights the change:

Terraform flags the difference, noting that it will remove the tag to restore the bucket to its default and expected state.
Since a drift has been detected, Terraform exits with exit code 2
, signaling that the infrastructure has changed outside of its control.
Logging Drift as a GitHub Issue
Because Terraform detected drift, GitHub Actions automatically created a new issue in the github repository to track the change. This makes sure that the drift is logged and assigned for investigation.

The issue includes a summary of the drift, highlighting what has changed and what Terraform expects. This helps engineers quickly identify the problem without running any Terraform commands.
At this point, we have everything in place to detect drift in real time. It eliminates any kind of guesswork, reduces the risk of unexpected failures, and makes sure that the teams are always aware of how infrastructure is evolving outside of Terraform’s control.
By making drift detection a routine part of the CI/CD pipeline, teams gain better visibility, faster response times, and improved infrastructure reliability. Whether or not the drift needs to be fixed immediately depends on the situation, but the key takeaway is this: You can’t fix what you don’t know about.
Best Practices for Long-Term Drift Management in CI/CD
Drift detection helps catch untracked infrastructure changes, but preventing drift in the first place is the real goal. Without proper controls, teams will constantly deal with unauthorized modifications, security gaps, and failed deployments. To minimize drift, teams should focus on controlling changes, enforcing automation, and improving visibility into the state of infrastructure.
Implement State Locking to Prevent Conflicting Changes
Drift often occurs when multiple engineers modify infrastructure simultaneously. Terraform state locking prevents this by ensuring only one person or process can apply changes at a time. Without state locking, teams risk overwriting each other's changes, leading to inconsistencies between Terraform’s state and the actual cloud setup.
Use AWS Config to Detect Unauthorized Configuration Changes
AWS Config continuously monitors resource configurations and can alert teams when changes occur outside Terraform. This provides an additional layer of visibility beyond GitHub Actions and helps catch drift caused by external automation or direct AWS console modifications.
Log and Audit Changes Using AWS CloudTrail
AWS CloudTrail records every API call made to AWS services, providing a clear history of who changed what and when. If Terraform detects drift, teams can use CloudTrail logs to identify the source of the change, whether it was an automation script, a service update, or a manual modification.
By combining these best practices within your workflow, teams can drastically reduce infrastructure drift and maintain more control over a predictable, stable cloud environment.
Firefly: A Smarter Way to Detect and Manage Infrastructure Drift
We started our drift detection process by running terraform plan
. That’s usually the first instinct when checking for infrastructure drift - run a plan, compare the expected state with the actual infrastructure state, and see if anything has changed. But running terraform plan manually every time isn’t a practical solution. It only provides a snapshot at that moment and doesn’t continuously monitor the infrastructure.
To automate this, we moved drift detection into a GitHub Actions workflow. Instead of relying on engineers to remember to check for drift, we ensured that Terraform runs automatically on every push to the main branch. If a drift is detected, the workflow logs the issue and alerts the team. This approach improved visibility and made drift detection part of the CI/CD process, but it still had a limitation - Terraform only detects drift when the pipeline runs. That means drift could still go unnoticed between deployments.
Now, what if there was a way to continuously track drift without running Terraform at all? What if we could integrate our cloud provider, see all changes, and get a complete picture of our infrastructure? That’s where Firefly comes in.
Firefly goes beyond Terraform’s perspective. Instead of only detecting drift for resources, it scans your entire infrastructure - whether the resources are managed through Terraform or created manually. It categorizes everything, showing how much of the infrastructure is codified, how much is unmanaged, and which resources have drifted from their expected state. This level of insight ensures teams can track misconfigurations, prevent unmanaged resources from accumulating, and maintain infrastructure compliance.
Even small configuration drifts can have operational and financial consequences. Firefly provides a detailed comparison of drifted resources, highlighting what changed between the expected and actual states.
In this example, Firefly detected a configuration drift in a Google Cloud Compute disk. The original Infrastructure as Code (IaC) configuration specified a disk size of 17GB, but the actual running configuration shows it was manually increased to 18GB.

Terraform would not be aware of this unless a terraform plan was run. If left untracked, this could cause inconsistencies between the desired and actual state, leading to unintended rollbacks or conflicts in future deployments.
Drift doesn’t just affect infrastructure consistency - it also impacts costs. Firefly automatically calculates the cost difference introduced by configuration drift, allowing teams to track how infrastructure changes impact monthly cloud expenses.
Here, Firefly detected that increasing the disk size from 17GB to 18GB resulted in a cost increase from $2.89 to $3.06 per month. While this might seem minor in isolation, cumulative drifts across multiple resources can lead to significant unexpected expenses.

By providing real-time visibility into cost fluctuations caused by drift, Firefly helps teams make informed decisions about whether to revert or adopt changes into their infrastructure code.
Once a drift is detected, teams need an efficient way to respond. Firefly doesn’t just identify drift – it also generates the exact Infrastructure as Code changes needed to align the resource with its actual configuration.
In this case, Firefly suggests updating the Terraform configuration to match the actual disk size of 18GB. The auto-generated remediation can be directly applied via a pull request, allowing teams to codify the change instead of manually updating Terraform files.

With this approach, Firefly eliminates the guesswork from drift management. Instead of relying on engineers to manually investigate and update Terraform code, the suggested fix can be reviewed, approved, and merged, ensuring the infrastructure remains fully codified and in sync with reality.
By integrating Firefly’s drift detection into workflows, teams can catch infrastructure misconfigurations before they become issues, track cost-related drift, and codify changes selectively rather than forcing rollbacks. Instead of waiting for a Terraform plan to run, Firefly provides real-time tracking of all cloud resources. Whether a resource was created through Terraform, the AWS console, or an automation script, Firefly ensures that nothing goes unnoticed. Teams no longer have to wonder if their infrastructure is aligned with their code - Firefly gives them a complete, always-up-to-date view of their environment.
Frequently Asked Questions
How do you detect infrastructure drift in Terraform?
Run terraform plan -detailed-exitcode
to compare the actual cloud state with the Terraform configuration. A non-zero exit code indicates drift.
What are the common causes of Terraform drift?
Drift occurs due to manual changes in the cloud console, automation scripts modifying resources, missing state locking, or AWS services auto-updating configurations.
How do you prevent drift in Terraform-managed infrastructure?
Use state locking, enforce IAM policies to restrict manual changes, monitor configurations with AWS Config, and integrate drift detection into CI/CD pipelines.
Can GitHub Actions be used for Terraform drift detection?
Yes, GitHub Actions can run terraform plan
on every push to detect drift and automatically log issues when any kind of discrepancies are found.