Choosing Between AWS CloudFormation and Terraform for Disaster Recovery

By Firefly

Explore the key differences between AWS CloudFormation and Terraform to determine which tool best supports your disaster recovery strategy.

Disaster recovery

Cloud asset management

Explore the resource

Why Does Disaster Recovery Need the Right IaC Tool?

Disaster recovery is an important part of any DevOps workflow, especially when you are working with cloud providers like AWS, GCP, Azure, or Alibaba Cloud. These cloud platforms offer features like auto-scaling, region-based replication, and built-in backups, but this doesn’t mean your infrastructure is resistant to failure. For example, a configuration error might cause an EC2 instance to become unreachable, or a network issue could prevent your application from accessing its S3 storage. These failures can have some serious effects if you don’t have a disaster recovery plan in place.

The purpose of disaster recovery is to get your services back up and running quickly after a failure. For example, if an EC2 instance in your primary region becomes corrupted or an S3 bucket is accidentally deleted by one of your teammates, you need to be able to restore it as soon as possible. If your application is using these resources, downtime could lead to loss of customer access or service interruptions. A properly configured IaC tool can help you restore the necessary infrastructure with minimal time spent on the recovery of your resources or services.

CloudFormation, Terraform, Pulumi, and other IaC tools can automate this recovery process by defining your infrastructure using code. This allows you to redeploy resources automatically when any failures occur, eliminating the need for manual steps like re-creating configurations or reloading backups. With IaC, you can make sure that the infrastructure you’re rebuilding is identical to what was originally deployed, removing the risk of any inconsistencies between the resources.

For example, let’s say your application is running on EC2 instances in us-east-1 region and relies on S3 buckets for file storage. If a regional failure happens, such as an unexpected network issue or an accidental deletion of an S3 bucket, your infrastructure could be disrupted. In this case, you need a reliable way to fail over to a secondary region like us-west-2 without any significant downtime for your application.

With CloudFormation, you can use stacks to define your EC2 instances, S3 buckets, and other resources as well. When an issue occurs, you can quickly replicate those resources to another region with minimum changes to the configuration. However, Terraform provides more flexibility than CLoudformation. It allows you to manage your infrastructure not only within AWS but across multiple cloud providers, meaning you can set up disaster recovery strategies that traverse AWS, GCP, and even Alibaba Cloud. This is especially useful if you're running a multi-cloud environment and want a unified approach to disaster recovery across all the cloud providers.

Additionally, Terraform offers state management, which tracks your infrastructure changes. If the environment diverges from the desired state due to configuration drift or any changes done through the cloud’s console, Terraform can alert you and help you bring everything back to the correct state. CloudFormation doesn’t provide this level of management for drift detection and reconciliation.

Some engineers out there might think that since cloud providers offer built-in resiliency, disaster recovery isn’t necessary. However, while services like AWS’s Auto Recovery and S3’s versioning feature are helpful, they don’t cover every use case. For example, a misconfiguration in a load balancer or a change within your infrastructure that isn't captured by the provider’s built-in tools could cause outages. In these cases, IaC tools allow you to recreate infrastructure exactly as it was before, so you’re not left with fixing misconfigured services or rebuilding failed resources.

Comparing CloudFormation and Terraform for Recovery Scenarios

Now that we’ve discussed the importance of a disaster recovery plan and how IaC tools help you automate this DR process, let’s compare how CloudFormation and Terraform handle specific recovery scenarios, especially when you are dealing with multi-cloud and multi-region cloud setups.

Handling Multi-Cloud and Multi-Region Setups

When setting up a disaster recovery plan, the ability to handle multi-cloud and multi-region environments is an important aspect. Terraform excels here because it allows you to manage your infrastructure across multiple cloud providers, including AWS, GCP, and Azure. This flexibility lets you create disaster recovery strategies that span across different cloud providers. For example, if AWS experiences a region-specific outage, you can use a separate set of Terraform configurations for GCP or Azure to fail over to a different cloud provider. While the exact configuration would differ based on the cloud provider, Terraform allows you to keep the infrastructure as code across all cloud providers, allowing you to manage and automate disaster recovery across different environments.

On the other hand, CloudFormation is AWS-specific. It can handle multi-region setups within AWS, but it cannot easily manage resources outside AWS. If you're using multiple cloud providers, you’ll need to use different tools or manually configure each cloud, making Terraform the better choice for multi-cloud environments.

Ease of Use for Teams

CloudFormation works well with AWS, which is convenient if your infrastructure is mainly built on AWS. However, using CloudFormation requires writing configurations in JSON or YAML, which can be difficult for teams that are new to Infrastructure as Code. Setting up resources in CloudFormation also requires defining complex dependencies between them, which can lead to mistakes if not done carefully.

Terraform, on the other hand, uses HCL, which is easier for most DevOps engineers to read and understand. It also has a large collection of pre-built modules that make setting up disaster recovery much quicker. Its large ecosystem of pre-built modules simplifies the setup process for disaster recovery, enabling teams to deploy quickly without reinventing the wheel.

Keeping Infrastructure State in Check

Terraform uses state files to track infrastructure and detect configuration drift, making sure that your environment matches the desired state within the cloud. This feature is especially useful for disaster recovery, as it allows Terraform to identify discrepancies and fix them automatically.

CloudFormation tracks resources within stacks. However, it does not offer the same state management features as Terraform. If resources are modified outside CloudFormation, it can result in inconsistencies, making it harder to recover or replicate the environment accurately during disaster recovery.

Hands-On: Setting Up Disaster Recovery with CloudFormation

Now in this section, we’ll walk through how to use CloudFormation to set up disaster recovery, specifically by configuring S3 replication between regions. We’ll also go through setting up a Route 53 failover strategy and testing the failover mechanism to ensure everything works as expected.

The first part of the setup focuses on replicating your data between two S3 buckets. This ensures that even if there is an issue in the primary region, your data can be quickly restored from the secondary region.

We first need to create two S3 buckets: one in the primary region (e.g., us-east-1) and one in the secondary region (e.g., us-west-2). Versioning must be enabled for both buckets to ensure that objects can be replicated correctly.

The CloudFormation YAML below creates two buckets and enables versioning on both:

Resources: PrimaryBucket: Type: AWS::S3::Bucket Properties: BucketName: firefly-dr-primary-bucket VersioningConfiguration: Status: Enabled BackupBucket: Type: AWS::S3::Bucket Properties: BucketName: firefly-dr-backup-bucket VersioningConfiguration: Status: Enabled

This configuration will create a primary bucket called firefly-dr-primary-bucket and a backup bucket called firefly-dr-backup-bucket, both with versioning enabled to allow for proper replication.

To replicate data between these buckets, IAM permissions are required. We will create an IAM role that allows S3 to perform replication tasks, such as ReplicateObject and GetObjectVersionForReplication.

The CloudFormation YAML below creates the IAM role and attaches the necessary policy:

Resources: ReplicationRole: Type: AWS::IAM::Role Properties: RoleName: Firefly-DR-S3ReplicationRole AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: s3.amazonaws.com Action: sts:AssumeRole Policies: - PolicyName: Firefly-DR-S3ReplicationPolicy PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - s3:GetReplicationConfiguration - s3:ListBucket - s3:GetObjectVersion - s3:ReplicateObject - s3:ReplicateDelete - s3:ReplicateTags Resource: - !Sub arn:aws:s3:::firefly-dr-primary-bucket - !Sub arn:aws:s3:::firefly-dr-primary-bucket/* - Effect: Allow Action: - s3:ReplicateObject - s3:ReplicateDelete - s3:ReplicateTags Resource: - !Sub arn:aws:s3:::firefly-dr-backup-bucket - !Sub arn:aws:s3:::firefly-dr-backup-bucket/*

This role will allow AWS S3 to assume the permissions needed to replicate objects from the primary bucket to the backup bucket.

Now that the IAM role and permissions are set up, the next step is to configure S3 replication. We will add a replication rule that tells S3 to replicate objects from the primary bucket to the backup bucket automatically.

The CloudFormation YAML below adds the replication configuration to the primary S3 bucket:

Resources: S3ReplicationConfiguration: Type: AWS::S3::BucketReplicationConfiguration Properties: Bucket: !Ref PrimaryBucket Role: !GetAtt ReplicationRole.Arn ReplicationConfiguration: Role: !GetAtt ReplicationRole.Arn Rules: - Status: Enabled Destination: Bucket: !Ref BackupBucket

This setup makes sure that any object added to the primary bucket (firefly-dr-primary-bucket) will be automatically replicated to the backup bucket (firefly-dr-backup-bucket) in another region.

Once your CloudFormation template is ready, you can create the stack using the following AWS CLI command:

aws cloudformation create-stack --stack-name Firefly-DR-Setup \ --template-body file://cloudformation-template.yaml \ --capabilities CAPABILITY_NAMED_IAM

This command will create the necessary resources as defined in your CloudFormation template.

Now, to allow S3 replication to function smoothly, the primary bucket needs the correct permissions. The following JSON policy gives the replication role permissions to access both the primary and backup buckets:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::975*******482:role/Firefly-DR-S3ReplicationRole" }, "Action": [ "s3:GetReplicationConfiguration", "s3:ListBucket", "s3:GetObjectVersion", "s3:ReplicateObject" ], "Resource": [ "arn:aws:s3:::firefly-dr-primary-bucket", "arn:aws:s3:::firefly-dr-primary-bucket/*" ] } ] }

To apply this policy, run this command:

aws s3api put-bucket-policy --bucket firefly-dr-primary-bucket --policy file://s3-replication-policy.json

To test the replication, we begin by creating a simple test file. Use the following command to write a message into a file called testfile.txt:

echo "Testing S3 replication" > testfile.txt

Once the file is created, we upload it to the primary S3 bucket using the aws s3 cp command. This will copy the test file to the firefly-dr-primary-bucket:

aws s3 cp testfile.txt s3://firefly-dr-primary-bucket/

After uploading the file, the next step is to check if the replication to the secondary S3 bucket has worked. You can do this by listing the files in the backup bucket using the aws s3 ls command. Run the following command to check the contents of the firefly-dr-backup-bucket:

aws s3 ls s3://firefly-dr-backup-bucket/

If the replication is working correctly, you should see the testfile.txt file appear in the secondary bucket. This confirms that the replication process has been set up correctly and that the data has been successfully copied from the primary bucket to the secondary bucket.

With S3 replication successfully verified, the disaster recovery setup is complete. Your data is now safely replicated across regions, ensuring quick recovery in case of a failure.

Hands-On: Building a Recovery Plan with Terraform

Now, as we’ve seen with CloudFormation, setting up disaster recovery for S3 replication and Route 53 failover is pretty simple. Now, let’s look at how we can achieve the same disaster recovery setup using Terraform. With Terraform, we can automate the replication of S3 buckets across regions, configure Route 53 for DNS failover, and ensure that our disaster recovery plan is simply integrated into our IaC.

We will start by defining two AWS providers, one for the primary region (us-east-1) and another for the secondary region (us-west-2). This enables us to manage resources in multiple regions.

provider "aws" { region = "us-east-1" # Primary region } provider "aws" { alias = "secondary" region = "us-west-2" # Secondary region }

For disaster recovery, it's important to replicate your data across multiple regions to ensure availability even if one region experiences issues. We’ll start by creating two S3 buckets, one in each region. We’ll also enable versioning on both buckets to ensure that the objects can be replicated properly.

First, we will create a primary S3 bucket in the us-east-1 region:

resource "aws_s3_bucket" "primary_bucket" { bucket = "primary-bucket-dr-${random_id.bucket_suffix.hex}" }

Next, we enable versioning on the primary S3 bucket to ensure that any changes to objects are tracked and can be replicated:

resource "aws_s3_bucket_versioning" "primary_bucket_versioning" { bucket = aws_s3_bucket.primary_bucket.id versioning_configuration { status = "Enabled" } }

Now, we create the secondary S3 bucket in the us-west-2 region and enable versioning on it as well:

resource "aws_s3_bucket" "secondary_bucket" { provider = aws.secondary bucket = "secondary-bucket-dr-${random_id.bucket_suffix.hex}" } resource "aws_s3_bucket_versioning" "secondary_bucket_versioning" { provider = aws.secondary bucket = aws_s3_bucket.secondary_bucket.id versioning_configuration { status = "Enabled" } }

Now, for replication to work, we need to configure an IAM role that allows S3 to perform replication tasks. We will create the IAM role and attach the necessary policy that grants permissions for replication.

resource "aws_iam_role" "s3_replication_role" { name = "s3-replication-role-${random_id.bucket_suffix.hex}" assume_role_policy = jsonencode({ Version = "2012-10-17", Statement = [{ Effect = "Allow", Principal = { Service = "s3.amazonaws.com" }, Action = "sts:AssumeRole" }] }) }

Next, we attach a replication policy to the IAM role:

resource "aws_iam_policy" "s3_replication_policy" { name = "s3-replication-policy-${random_id.bucket_suffix.hex}" policy = jsonencode({ Version = "2012-10-17", Statement = [{ Effect = "Allow", Action = ["s3:ReplicateObject", "s3:GetObjectVersionForReplication"], Resource = [ aws_s3_bucket.primary_bucket.arn, aws_s3_bucket.secondary_bucket.arn ] }] }) } resource "aws_iam_role_policy_attachment" "s3_replication_attach" { role = aws_iam_role.s3_replication_role.name policy_arn = aws_iam_policy.s3_replication_policy.arn }

Once the IAM role and policy are set up, we configure S3 replication to automatically replicate objects from the primary bucket in us-east-1 to the secondary bucket in us-west-2:

resource "aws_s3_bucket_replication_configuration" "replication" { role = aws_iam_role.s3_replication_role.arn bucket = aws_s3_bucket.primary_bucket.id rule { id = "replication-rule" status = "Enabled" destination { bucket = aws_s3_bucket.secondary_bucket.arn } } }

With the setup complete, the next step is to initialize Terraform and apply the configuration to create the resources. Run the following command to initialize Terraform:

terraform init

Once the initialization is complete, apply the configuration to provision the resources with terraform apply command.

After the resources are created, we’ll now test the replication. To do this, we create a simple test file and upload it to the primary bucket:

echo "Testing S3 replication" > testfile.txt aws s3 cp testfile.txt s3://primary-bucket-dr-e96b88ab/

Next, we list the files in the secondary bucket to confirm that the file has been replicated:

aws s3 ls s3://secondary-bucket-dr-e96b88ab/

If the setup is correct, you should see the testfile.txt file in the secondary bucket.

Once the replication is successfully tested, your disaster recovery setup is complete. You now have a strong and automated solution for ensuring data continuity across regions, minimizing downtime, and enabling a quick recovery in case of any disruptions.

Best Practices for Disaster Recovery Using IaC

Now that we’ve seen how to set up a disaster recovery plan using CloudFormation and Terraform, it’s important to focus on making sure the process is reliable in the long run. Having the right setup is just part of the picture; following best practices can help ensure everything works smoothly when you need it the most. In this section, we’ll go over some practical tips for managing disaster recovery with IaC.

Modularize Your Code

When managing infrastructure as code, it's important to break down your configuration into smaller, reusable modules. For example, you could create separate modules for networking, instances, and storage. This practice makes your codebase more maintainable and allows for easier scaling. Additionally, modularizing your code allows you to apply updates to specific components without affecting the entire environment.

Automate Disaster Recovery Testing

Automated tests are essential to ensure your disaster recovery plan works as expected. Setting up automated tests simulates failure scenarios to confirm that critical services, such as Route 53 failover or S3 replication, are functioning correctly. By automating these tests, you reduce human error and make sure your DR setup is always ready to handle a real disaster.

Monitor Infrastructure Drift

Configuration drift occurs when changes are made outside of the IaC tool, for example, via the AWS Management Console. This can lead to differences between the desired and actual state of your infrastructure. Regularly monitor and manage drift to keep your infrastructure aligned with the configuration defined in your IaC code. Terraform provides state management for tracking these changes, while CloudFormation offers some level of change detection but may require additional monitoring steps.

Conduct Regular Testing

Testing is a continuous process in disaster recovery planning. Running scheduled tests simulates failure scenarios and makes sure that everything from data replication to DNS failover works smoothly. Regular testing minimizes downtime in case of a real disaster by making sure that your recovery plan is fully operational. Make sure your tests cover all important resources and failover strategies.

As we’ve seen in the hands-on sections, setting up disaster recovery with IaC tools like CloudFormation and Terraform can provide a solid foundation for ensuring infrastructure availability. However, one important aspect we need to consider is that relying just on IaC tools doesn’t always guarantee that all of your resources are properly tracked and recoverable.

Let’s say your primary region goes down, and your backup resources, whether in a secondary region or across another cloud, need to be restored quickly. But what if some of those resources were created outside of your IaC setup, and you’ve missed tracking them? In many cases, if an untracked resource gets deleted or becomes inaccessible, it could complicate the recovery process, even with the best disaster recovery strategy in place.

Using Firefly to Simplify Disaster Recovery

This is where Firefly steps in to solve this issue. With Firefly, you can make sure that all your infrastructure is properly tracked and managed, even the resources that were originally unmanaged or created outside of IaC tools like Terraform or CloudFormation.

Convert Unmanaged Resources to Code

The problem we’ve discussed of untracked resources can now be solved easily with Firefly’s Codify feature. Firefly allows you to identify unmanaged resources and bring them into your IaC configuration by converting them into code. If some resources weren’t initially managed with Terraform or CloudFormation, Firefly’s Codify option helps you turn them into Infrastructure as Code, effectively tracking them going forward. This process can be done with an import command to easily include resources that were manually created or forgotten during the initial setup.

Once all resources are under management in IaC, you can rest easily knowing that when disaster strikes, a simple terraform init and terraform apply will bring your entire infrastructure back.

Access Deleted Resources and History

Another common challenge during disaster recovery is that sometimes resources are mistakenly deleted. Without an easy way to restore them, you could face extended downtime or even data loss. Firefly solves this issue by keeping a record of deleted resources, allowing you to easily fix them when needed. You can also view the history of any resource, track changes, and identify when and how those changes occurred.

This means that if a resource is deleted or changes unexpectedly, Firefly enables you to go back and retrieve that resource or understand its previous state. This added visibility and restore functionality gives you peace of mind, knowing that all your important assets can be recovered easily.

Firefly continuously monitors your infrastructure, making sure that your recovery plan stays aligned with your actual environment. By regularly checking for drift, Firefly helps you catch any differences and makes sure your infrastructure is always in the state that it should be. If any drift is detected, you can quickly bring everything back to the desired state with minimal intervention from your end.

Frequently Asked Questions

What are the disadvantages of AWS CloudFormation?

CloudFormation is AWS-specific, limiting multi-cloud support. It uses JSON/YAML, which can be complex to manage at scale.

What are the disadvantages of Terraform?

Terraform requires careful state management, and its flexibility can lead to complexity in large environments.

When should you not use Terraform?

Avoid Terraform if you're solely working within AWS, where CloudFormation offers better native integration.

What is the difference between CFN and TF?

CloudFormation is AWS-specific, while Terraform supports multi-cloud environments. Terraform also uses state management, unlike CloudFormation.

What is the difference between Terraform modules and CloudFormation modules?

Terraform modules are reusable across multiple clouds, while CloudFormation uses nested stacks, which are AWS-specific.

Featured blog posts

The Misconfig Heard Around the World: Why Ops is Always Business Critical

Embracing the Future: Firefly Innovation and the Gartner SRE Hype Cycle 2024

Implementing a Robust Cloud Governance Framework: 4 Steps to Control Your Cloud Infrastructure

Related case studies

How Basis Technologies took control of infrastructure sprawl — reducing cloud waste by 83%

How Comtech quickly reduced cloud waste by $180,000 per year using Firefly’s cloud governance

Sportradar’s journey from Cloudformation to Terraform in a few clicks with Firefly

Play Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your mission: track, manage, and control changes across your entire cloud ecosystem.

An asset mutation occurs when an asset revision is made in your cloud infrastructure. Some are beneficial and lead to a well-controlled cloud, but others are harmful, creating risk and waste.

Use your ↑up and ↓down arrow keys to collect as many beneficial asset mutations as possible.

Avoid harmful asset mutations! Firefly enables rollbacks, but—in this game—you are only allowed 3. When you apply a harmful mutation and are out of rollbacks, your services will be disrupted and it is game over.

Play Drift Defender

Firefly Drift Defender

Score: 0 | High Score: 0

Welcome to Firefly Drift Defender!

Your mission is to prevent drifts in your cloud infrastructure. A drift occurs when the desired state defined in your configuration files doesn't match the actual state of your cloud infrastructure, which can cause deployment issues and security risks.

In this game, you are trying to prevent drift in your Databases, Network, Server, and Storage configurations. When a drift occurs, a resource will catch on fire.

Click on the drifted resource to automatically remediate it, and earn points.

Sadly, your platform engineers are making several manual changes in your cloud consoles, so you'll experience more drifts over time. When you have 5 drifts simultaneously, your services will be disrupted and the game will be over.

Game Over

Your Score: 0

Your High Score: 0

Play Ghosty Cloud

Firefly Ghosty Cloud

score2: 0 | High score2: 0

Welcome to Firefly Ghosty Cloud!

Your mission is to avoid ghosted resources in your cloud infrastructure.

A ghosted resource was once created through Infrastructure as Code (IaC) but has since been deleted or is missing from the actual cloud infrastructure.

In this game, use your spacebar to avoid ghosted resources in your cloud.

The further you go without encountering a ghost resource, the more points you earn for having a reliable and immutable cloud infrastructure.

Game Over

Your score: 0

Your high score: 0

Choosing Between AWS CloudFormation and Terraform for Disaster Recovery

Why Does Disaster Recovery Need the Right IaC Tool?

Comparing CloudFormation and Terraform for Recovery Scenarios

Handling Multi-Cloud and Multi-Region Setups

Ease of Use for Teams

Keeping Infrastructure State in Check

Hands-On: Setting Up Disaster Recovery with CloudFormation

Hands-On: Building a Recovery Plan with Terraform

Best Practices for Disaster Recovery Using IaC

Modularize Your Code

Automate Disaster Recovery Testing

Monitor Infrastructure Drift

Conduct Regular Testing

Using Firefly to Simplify Disaster Recovery

Convert Unmanaged Resources to Code

Access Deleted Resources and History

Frequently Asked Questions

What are the disadvantages of AWS CloudFormation?

What are the disadvantages of Terraform?

When should you not use Terraform?

What is the difference between CFN and TF?

What is the difference between Terraform modules and CloudFormation modules?

Featured blog posts

The Misconfig Heard Around the World: Why Ops is Always Business Critical

Embracing the Future: Firefly Innovation and the Gartner SRE Hype Cycle 2024

Implementing a Robust Cloud Governance Framework: 4 Steps to Control Your Cloud Infrastructure

Related case studies

How Basis Technologies took control of infrastructure sprawl — reducing cloud waste by 83%

How Comtech quickly reduced cloud waste by $180,000 per year using Firefly’s cloud governance

Sportradar’s journey from Cloudformation to Terraform in a few clicks with Firefly

Firefly: alien technology, now available on Earth

Firefly: alien technology, now available on Earth

Play Asset Mutations Racer

Firefly Asset Mutations Racer

Welcome to the Asset Mutations Racer

Your Cloud Asset Mutations

Game over

Play Drift Defender

Firefly Drift Defender

Welcome to Firefly Drift Defender!

Your Infrastructure

Game Over

Play Ghosty Cloud

Firefly Ghosty Cloud

Welcome to Firefly Ghosty Cloud!

Game Over