Chaos Under Control: Addressing Cloud Infrastructure Drift

By Eran Bibi

Infrastructure drift is more than just a technical nuisance; it’s a pervasive problem that—left unchecked—can compromise your entire organization.

Governance

Cloud basics

Published Jun 17, 2025

Infrastructure drift is a pervasive challenge for organizations managing cloud resources at scale. While Infrastructure as Code (IaC) offers a structured approach to deploying and maintaining infrastructure, drift still occurs when changes happen outside IaC workflows. And this isn’t necessarily anomalous behavior — this can happen at any given time by an external contractor, during a high-pressure situation (such as an incident) that requires quick resolution, or due to a lapse in judgment or an overly privileged tool.

While we always aspire to maintain perfect IaC hygiene with flawless GitOps processes, unfortunately this is pretty much wishful thinking and impossible to enforce. In practice, we see an overreliance on ClickOps (or the manual execution of tasks by clicking through various options within software tools, which can be more accessible for users who may not be familiar with coding or scripting). And that manual process can often be the cause of infrastructure drift.

Infrastructure drift refers to the divergence between the actual state of infrastructure in the cloud and the desired state defined in IaC tools like Terraform. This discrepancy can lead to security vulnerabilities, reliability issues and operational inefficiencies.

At Firefly, we scan and process more than 55,000 cloud accounts through our system daily. In that, we process almost 320,000 drifts per month, so we really understand the sheer magnitude and implications of the infrastructure drift problem. We’ve also seen that 90% of large-scale deployments using IaC experience drift, and about half of those cases go unnoticed. For those organizations, there’s a 100% chance of negative impact, whether it’s on reliability, security or toil.

Common Causes of Infrastructure Drift (Avoidable and Unavoidable)

There are many reasons infrastructure drift is so common, despite growing understanding that it needs to be mitigated. Many of the causes result from everyday maintenance of large-scale cloud infrastructure and high-velocity and high-pressure delivery cycles.

Common reasons infrastructure drift occurs include:

Manual emergency fixes: During incidents or emergencies, engineers often make direct changes to infrastructure through cloud consoles or APIs. These changes can address immediate issues but may bypass IaC pipelines, leading to drift.
Legacy resources: Organizations that adopt IaC midstream may have existing resources that were created manually or with different tools. These unmanaged resources are prone to drift as they fall outside IaC governance.
Automated tools with permissions: Tools like cloud security posture management (CSPM) may have permissions to modify configurations, such as security groups. When these tools make changes outside of IaC workflows, drift is introduced.
Partial IaC adoption: Some organizations implement IaC selectively, managing only new or specific projects with IaC while older or different resources are managed manually. This inconsistency can result in drift across environments.
Environment misalignment: Although production environments are often tightly controlled, staging and development environments may allow more flexibility for developers. Manual changes in these environments can create discrepancies, especially if configurations don’t match across environments.
IaC and cloud API misalignment: Cloud providers frequently update their APIs and services, which can lead to drift if IaC tools aren’t updated to match. This misalignment can cause IaC deployments to diverge from the current cloud state.

Manual emergency fixes are unavoidable for even the most evolved engineering organizations. Yet, while these changes may address immediate issues, they bypass IaC pipelines, leading to discrepancies. Additionally, organizations that adopt IaC partway through their cloud journey may have legacy resources created outside IaC governance, making them prone to drift. Automated tools, such as CSPM systems, may have permissions to modify configurations such as security groups; changes made by these tools outside of IaC workflows can introduce further discrepancies.

What Infrastructure Drift Looks Like

Infrastructure drift can take many forms, often beginning with minor changes that snowball into significant discrepancies.

For instance, consider an AWS identity and access management (IAM) policy managed through Terraform, where a drift occurs when someone adds something as simple as an asterisk (*) to a policy, which expands permissions from read-only to full access. Similarly, in a Kubernetes environment, a role with read-only permissions in IaC might be modified to include write and delete permissions in the actual cluster — which can potentially cause a lot of production damage. These seemingly small adjustments can compromise security and lead to unintended access.

When drift goes unchecked, it can pose risks beyond minor inconveniences.

Data from our 2024 State of Infrastructure as Code Report shows that it is often going unchecked. Not only is infrastructure drift frequently flying under the radar undetected, even when it is detected, it’s not getting remediated right away. Worryingly, 13% of the time, infrastructure drift isn’t fixed at all.

Beyond just the major risk of downtime, unaddressed drift can impact the stability and security of your infrastructure. For example, when permissions or configurations change outside IaC, it can open vulnerabilities that attackers might exploit. Drift can also affect service reliability if the infrastructure’s actual state doesn’t match the desired configurations tested in staging. All in all, drift is more than a just technical nuisance, and it can compromise your organization as a whole.

First: Practical Approaches to Proactive Drift Detection

Managing drift effectively requires robust monitoring and detection, as well as tried-and-true methods to mitigate it as quickly as possible.

Below are some handy tips for detecting and managing drift:

Drift monitoring: Terraform’s plan or Pulumi’s preview command can be used to detect drift, as can running AWS CloudFormation’s drift detection command via the command line interface (CLI). By scheduling regular checks, teams can compare the current infrastructure state with the desired configuration. If drift is detected, an exit code will indicate a discrepancy, enabling teams to respond accordingly.
GitOps for Kubernetes: For Kubernetes environments, GitOps tools like Argo CD and Flux continuously reconcile the cluster state with the configuration stored in Git. These tools help ensure that any unauthorized changes are quickly reverted, maintaining alignment with the source of truth in Git.
Drift detection tools: Open source tools like Driftctl and KubeDiff provide targeted drift detection capabilities. Driftctl works well with IaC tools like Terraform, while KubeDiff is optimized for Kubernetes configurations.
Real-time alerts and routing: Establishing alerting mechanisms is crucial for effective drift management. By integrating IaC tools with Slack or PagerDuty, teams can receive real-time notifications of drift, enabling prompt resolution.

These are a good way to detect drift, but the goal must be remediating the drift.

Next: Strategies for Drift Remediation

Remediating drift can take two main forms: aligning the cloud environment with IaC or updating IaC to reflect the actual state. In cases where manual changes are temporary fixes, reapplying IaC configurations can restore the desired state. However, if manual changes represent necessary adjustments, it’s best to update the IaC templates to align with the actual state, preventing recurring drift.

If you’re just starting out with drift detection, a simple monitoring script using Terraform can provide valuable insights into discrepancies. Although this basic approach may not scale for large deployments, it can be effective for smaller setups or as a proof of concept. For larger environments, tools like Firefly, driftctl or GitOps frameworks provide a more robust solution for handling the complexity of enterprise-scale infrastructures.

Getting Infrastructure Drift Under Control

Infrastructure drift is an ongoing challenge in cloud environments, but with the right tools and practices, organizations can maintain control over their infrastructure.

By leveraging IaC, monitoring drift proactively and implementing strategies like GitOps, teams can minimize the impact of drift, ensuring infrastructure remains consistent and aligned with organizational needs. Regular drift detection and timely remediation ultimately improve the security, reliability and efficiency of cloud operations, empowering teams to deliver with confidence at the velocity modern companies require.

Featured blog posts

Tackle IaC Tooling Complexity and Growing Cloud Costs in 2025

Why Most IaC Strategies Still Fail — and How to Fix Them

The Super Helm Chart: To Deploy or Not to Deploy?

Related case studies

Aspyr gains visibility and control in the wake of cloud chaos

How AppsFlyer achieved 84% greater platform engineering efficiency with Firefly

How Aqua Security achieved 100% visibility and governance over their infrastructure

Play Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your mission: track, manage, and control changes across your entire cloud ecosystem.

An asset mutation occurs when an asset revision is made in your cloud infrastructure. Some are beneficial and lead to a well-controlled cloud, but others are harmful, creating risk and waste.

Use your ↑up and ↓down arrow keys to collect as many beneficial asset mutations as possible.

Avoid harmful asset mutations! Firefly enables rollbacks, but—in this game—you are only allowed 3. When you apply a harmful mutation and are out of rollbacks, your services will be disrupted and it is game over.

Play Drift Defender

Firefly Drift Defender

Score: 0 | High Score: 0

Welcome to Firefly Drift Defender!

Your mission is to prevent drifts in your cloud infrastructure. A drift occurs when the desired state defined in your configuration files doesn't match the actual state of your cloud infrastructure, which can cause deployment issues and security risks.

In this game, you are trying to prevent drift in your Databases, Network, Server, and Storage configurations. When a drift occurs, a resource will catch on fire.

Click on the drifted resource to automatically remediate it, and earn points.

Sadly, your platform engineers are making several manual changes in your cloud consoles, so you'll experience more drifts over time. When you have 5 drifts simultaneously, your services will be disrupted and the game will be over.

Game Over

Your Score: 0

Your High Score: 0

Play Ghosty Cloud

Firefly Ghosty Cloud

score2: 0 | High score2: 0

Welcome to Firefly Ghosty Cloud!

Your mission is to avoid ghosted resources in your cloud infrastructure.

A ghosted resource was once created through Infrastructure as Code (IaC) but has since been deleted or is missing from the actual cloud infrastructure.

In this game, use your spacebar to avoid ghosted resources in your cloud.

The further you go without encountering a ghost resource, the more points you earn for having a reliable and immutable cloud infrastructure.

Game Over

Your score: 0

Your high score: 0

Chaos Under Control: Addressing Cloud Infrastructure Drift

Common Causes of Infrastructure Drift (Avoidable and Unavoidable)

What Infrastructure Drift Looks Like

First: Practical Approaches to Proactive Drift Detection

Next: Strategies for Drift Remediation

Getting Infrastructure Drift Under Control

Featured blog posts

Tackle IaC Tooling Complexity and Growing Cloud Costs in 2025

Why Most IaC Strategies Still Fail — and How to Fix Them

The Super Helm Chart: To Deploy or Not to Deploy?

Related case studies

Aspyr gains visibility and control in the wake of cloud chaos

How AppsFlyer achieved 84% greater platform engineering efficiency with Firefly

How Aqua Security achieved 100% visibility and governance over their infrastructure

Curious to learn more about IaC? Explore our free resources or schedule a demo.

Play Asset Mutations Racer

Firefly Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your Cloud Asset Mutations

Game over

Play Drift Defender

Firefly Drift Defender

Welcome to Firefly Drift Defender!

Your Infrastructure

Game Over

Play Ghosty Cloud

Firefly Ghosty Cloud

Welcome to Firefly Ghosty Cloud!

Game Over