In CloudFormation, drift happens when the actual state of your infrastructure doesn’t match the configuration defined in your CloudFormation templates. This can occur when changes are made directly through the cloud console, shell scripts, or another IaC tool instead of using the CloudFormation stack.
Let’s say you have created an S3 bucket using a CloudFormation template, and one of your team members updates its access policy through the AWS console. This update won’t appear in the Cloudformation template, resulting in infrastructure drift.
What are the risks of configuration drift?
Now, when a drift occurs within your CloudFormation stack, it can cause some issues that may impact the stability, security, and cost of your infrastructure:
- IaC Inconsistency
Drift happens when the resources in your cloud environment no longer match the configuration in your CloudFormation template. For example, if the instance type of an EC2 instance is updated from t2.micro to t3.medium, or a security group is modified directly through the AWS Console, these changes won't be reflected in the CloudFormation template. In an infrastructure managed with code, all resources should be configured through the defined templates to maintain consistency and uniformity. When changes are made outside the code, it creates differences between the cloud resources and the IaC template. This mismatch can cause issues when deploying or updating resources, such as failed updates, rollback loops, or misconfigurations, leading to deployment failures.
- Compliance Violations
In regulated environments, drift can cause your infrastructure to become non-compliant. For example, you may have specific IAM roles or security configurations in your CloudFormation template to meet regulations like GDPR or HIPAA. If those configurations are changed directly in the cloud console or through shell scripts, such as modifying access permissions or turning off encryption, your environment might no longer meet these requirements, putting you at risk of extra bills, audits, or security issues.
- Increased Costs
Drift can lead to misconfigured or over-provisioned resources, increasing your cloud bills. For example, if someone from your team changes an EC2 instance type in the console, like upgrading from a t2.micro to a t2.large, it can increase your bills. Similarly, if the number of nodes in your EKS cluster is increased through the console, it can result in more resources being used than necessary. Over time, these changes can add up and lead to unexpected costs that are difficult to identify and address during audits.
How do you manage configuration drift?
Now, let's focus on what you can do to avoid these issues that we discussed above and make sure that your CloudFormation stacks remain consistent with your planned configuration.
- Regularly Check for Drift
One of the easiest ways to identify drift is by regularly checking your CloudFormation stacks for differences. AWS provides a built-in feature to detect drift in your CloudFormation resources. You can run a drift detection operation using the AWS CLI or SDKs. To use the AWS CLI, you can run the describe-stack-resource-drifts command, which will show you which resources no longer match the expected template configuration and need to be updated. By performing this check every week, you can catch problems early before they cause some major issues.
- Avoid Changes through the Console
It’s best practice to avoid modifying resources through the AWS Console. Any changes made directly in the console or through scripts can cause drift and introduce inconsistencies between your infrastructure and your CloudFormation templates. All changes should be made using CloudFormation templates or other IaC tools like Terraform to make sure that your infrastructure is always in sync with the config.
- Standardize IaC Tools
In an environment where multiple tools are used, drift can happen when different IaC tools manage the same resources. To prevent this, it’s best to standardize the tools your team uses. By using one tool, like CloudFormation or Terraform, for all infrastructure management, you can make sure that everyone is on the same page. This helps avoid conflicting changes that could lead to drift.
- Apply Tagging Standards
Another helpful practice is to apply consistent tagging standards across your resources. Tags can help you identify resources that are part of a CloudFormation stack and make it easier to track changes. By standardizing tagging conventions, you can easily spot resources modified outside of CloudFormation and quickly handle any possible drift.
By following these practices, you can reduce the chances of drift and keep your CloudFormation stacks in sync with your planned setup.
What causes Configuration Drift?
Drift can happen when changes are made directly through the Cloud’s console or when multiple IaC tools are used to manage the same resources. Let’s explore these two leading causes:
Direct Changes via Cloud Console
One common way drift occurs is when changes are made directly in the AWS Console, bypassing CloudFormation. This could be anything from modifying an EC2 instance type to changing security group rules or even modifying a VPC’s CIDR block.
In this section, we will walk through an example where we create a VPC using a CloudFormation template and then modify the CIDR block of that VPC through the AWS Console. CloudFormation will not be aware of this change, which will cause the actual state of the VPC to differ from the configuration defined in the CloudFormation template, resulting in drift. We will then demonstrate how to detect this drift using AWS CLI.
Let's begin with the CloudFormation template to create a VPC and an IAM role:
To create the stack, you can use the AWS CLI with the following command:
This will create the VPC and IAM role defined in the template. You can verify the stack creation by checking the resources in the AWS Console or by using the CLI:
Now, let's say you or a team member decide to update the VPC's CIDR block directly from the AWS Console. In the AWS Console, navigate to the VPC dashboard, select the "FireflyVPC," and change the CIDR block to something different, like 10.1.0.0/16.
After the change is made, the CloudFormation template is no longer in sync with the actual configuration of the VPC. To detect this drift, you can run the drift detection operation using the AWS CLI:
Once the operation completes, you can view the drift status with this command:
The output shows that the stack has a DRIFTED status, meaning one or more resources in the stack have configurations that no longer match the ones defined in the template. Specifically, the “DriftedStackResourceCount”: 1
indicates that one resource within the stack has drifted from its expected configuration.
This is how you can easily detect drift in your CloudFormation configurations and make sure that your resources remain aligned with the defined templates.
Conflicts from Overlapping IaC Tools
Drift can also occur when multiple IaC tools are used to manage the same resource. For example, if a resource created with CloudFormation is later managed or modified using another IaC tool like Terraform, it leads to drift because Terraform maintains its own state file. Any changes made through Terraform will update its state file but won't reflect in the CloudFormation template, causing inconsistencies between the two tools.
In this example, we will use the VPC created earlier with CloudFormation, import it into Terraform, and add a new tag using Terraform. This modification will cause drift in the CloudFormation managed resource.
First, we will use Terraform to manage the existing VPC created earlier with CloudFormation. Below is a Terraform configuration to define the VPC and add the new tag.
To import the existing VPC into Terraform, run the following command. Replace <vpc-id> with the ID of the VPC created by CloudFormation:
This command will import the current state of the VPC into Terraform.
Now, update the Terraform configuration to include the new tag. Apply the changes using the terraform apply
command
Terraform will update the tags for the VPC by adding the new tag. However, since this change was made using Terraform, CloudFormation won’t be aware of it, causing a drift between the CloudFormation template and the actual state of the VPC.
After applying the Terraform changes, you can run a drift detection operation in CloudFormation to identify the inconsistency:
Once the Terraform changes have been applied, we can detect the drift in the CloudFormation-managed stack. Instead of checking the overall drift status, we’ll use the describe-stack-resource-drifts command to get detailed information about the specific resources that have drifted.
Run the following command to get the resource-level drift details:
This command provides a detailed breakdown of which resources have drifted and highlights the differences between their actual and expected properties. For this example, a new tag (Created_by=Terraform) was added to the VPC, which was not defined in the CloudFormation template.
From these two examples, we can see that detecting the drift using CloudFormation’s built-in tools, such as the describe-stack-resource-drifts
command, provides some useful insights about the drift but still involves significant effort from our end as well. While we were able to detect the drift and see the differences, there’s no automated way to analyze the drift efficiently. In larger setups with multiple resources and stacks, this process becomes even more difficult. This is where Firefly steps in to automate and simplify drift detection and management.
Introducing Firefly: A Simple Solution for Managing CloudFormation Drift
Firefly makes the process of identifying and resolving drifted resources in your infrastructure much simpler. When a resource is drifted from its defined configuration, Firefly provides a clear, visual representation of the drift, making it easy to understand the changes and take corrective action.
In the Firefly dashboard, all resources are presented in one place, including drifted ones. If you click on the drifted resources section, it takes you to the inventory view, where all drifted resources are listed. This centralized view makes sure that you can quickly locate and focus on resources that require attention.
By clicking on any drifted resource, you can access detailed information about it. This includes its current status, properties, tags, and other metadata, like its creation date and location.
If you further click on Drift Details, Firefly shows a side-by-side comparison of the desired configuration (as defined in your IaC tool) and the actual configuration. This makes it clear what has changed. For example, the desired configuration might set a property like associate public IP address to true, but the actual configuration shows it as false. This detailed breakdown helps you pinpoint exactly what needs to be fixed.
Firefly doesn’t just stop at showing you the drift. It helps you fix it by providing an option to codify the drifted resource. For example, you can generate an updated configuration as a CloudFormation or Terraform template. This feature makes it easier to bring the resource back in line with your IaC configurations. By using Firefly’s codify, you make sure that your infrastructure stays consistent, organized, and easy to manage.
With this much information at your fingertips, you can quickly assess the impact of the drift and take corrective actions, either by updating the IaC configuration or bringing the resource back in sync with the defined state.