Infrastructure as Code & Automation
Master CloudFormation, CDK, Terraform, and CI/CD pipelines for automated infrastructure deployment
Infrastructure as Code & Automation
Master Infrastructure as Code (IaC) and automation with free flashcards and hands-on examples. This lesson covers CloudFormation templates, Terraform configurations, automation strategies, and CI/CD pipelinesβessential skills for AWS DevOps practitioners and cloud architects.
Welcome π
Infrastructure as Code has revolutionized how we deploy and manage cloud resources. Instead of manually clicking through the AWS console, IaC lets you define your entire infrastructure in code files that can be versioned, reviewed, and deployed automatically. This approach eliminates human error, ensures consistency across environments, and makes your infrastructure reproducible.
In this lesson, you'll learn how to write CloudFormation templates, use Terraform to manage multi-cloud resources, implement automation with AWS Systems Manager, and build CI/CD pipelines that deploy infrastructure changes safely and reliably.
Core Concepts π»
What is Infrastructure as Code?
Infrastructure as Code (IaC) is the practice of managing and provisioning computing infrastructure through machine-readable definition files, rather than physical hardware configuration or interactive configuration tools.
Key benefits of IaC:
- Consistency: Same code produces identical infrastructure every time
- Version Control: Track changes using Git, rollback if needed
- Documentation: Code serves as living documentation
- Speed: Deploy complex environments in minutes
- Cost Savings: Easily destroy/recreate environments
- Collaboration: Teams can review infrastructure changes like application code
π‘ Think of IaC like a recipe: Instead of verbally explaining how to bake a cake each time, you write down the recipe once. Anyone following it gets the same result.
AWS CloudFormation π
CloudFormation is AWS's native IaC service. You write templates in JSON or YAML that describe AWS resources, and CloudFormation provisions them in the correct order with proper configuration.
Key CloudFormation concepts:
| Concept | Description |
|---|---|
| Template | JSON/YAML file defining resources |
| Stack | Collection of AWS resources created from a template |
| Change Set | Preview of changes before execution |
| Parameters | Dynamic inputs to customize templates |
| Outputs | Values exported from stacks (e.g., resource IDs) |
| Intrinsic Functions | Built-in functions like Ref, GetAtt, Join |
CloudFormation template structure:
AWSTemplateFormatVersion: '2010-09-09'
Description: 'My infrastructure template'
Parameters:
EnvironmentName:
Type: String
Default: dev
AllowedValues: [dev, staging, prod]
Resources:
MyVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-vpc'
MySecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow HTTP traffic
VpcId: !Ref MyVPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
Outputs:
VPCId:
Description: VPC ID
Value: !Ref MyVPC
Export:
Name: !Sub '${EnvironmentName}-vpc-id'
β οΈ Common CloudFormation mistake: Forgetting that resource names must be unique within a template. Use !Sub to add dynamic prefixes.
Terraform π
Terraform by HashiCorp is a popular open-source IaC tool that works with multiple cloud providers (AWS, Azure, GCP) and uses its own declarative language called HCL (HashiCorp Configuration Language).
Why choose Terraform over CloudFormation?
- Multi-cloud support: Manage AWS, Azure, GCP with one tool
- Module ecosystem: Reusable modules from Terraform Registry
- State management: Tracks actual infrastructure state
- Plan before apply: See exactly what will change
- Larger community: More third-party integrations
Terraform workflow:
ββββββββββββββββββββββββββββββββββββββββββββββββ β TERRAFORM WORKFLOW β ββββββββββββββββββββββββββββββββββββββββββββββββ€ β β β π Write β π Init β π Plan β β Apply β β .tf download preview execute β β files providers changes changes β β β β β β β ποΈ State file β β (tracks reality) β β β ββββββββββββββββββββββββββββββββββββββββββββββββ
Terraform configuration example:
## Provider configuration
provider "aws" {
region = var.aws_region
}
## Variables
variable "aws_region" {
description = "AWS region"
default = "us-east-1"
}
variable "instance_type" {
description = "EC2 instance type"
default = "t3.micro"
}
## VPC Resource
resource "aws_vpc" "main" {
cidr_block = "10.0.0.0/16"
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "main-vpc"
Environment = terraform.workspace
}
}
## Subnet Resource
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
cidr_block = "10.0.1.0/24"
availability_zone = "us-east-1a"
map_public_ip_on_launch = true
tags = {
Name = "public-subnet"
}
}
## EC2 Instance
resource "aws_instance" "web" {
ami = data.aws_ami.amazon_linux.id
instance_type = var.instance_type
subnet_id = aws_subnet.public.id
tags = {
Name = "web-server"
}
}
## Data source to fetch latest AMI
data "aws_ami" "amazon_linux" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
## Outputs
output "instance_public_ip" {
description = "Public IP of web server"
value = aws_instance.web.public_ip
}
π‘ Terraform tip: Always use remote state storage (S3 + DynamoDB) for team collaboration. Local state files don't work well with multiple developers.
Terraform state file structure:
Terraform State Management
ββββββββββββββββββββββββββββββββββββββ
β π terraform.tfstate β
β ββββββββββββββββββββββββββββββββ β
β β { β β
β β "version": 4, β β
β β "resources": [ β β
β β { β β
β β "type": "aws_vpc", β β
β β "name": "main", β β
β β "instances": [{ β β
β β "id": "vpc-123...", β β
β β "attributes": {...} β β
β β }] β β
β β } β β
β β ] β β
β β } β β
β ββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββ
β Compares
ββββββββββββββββββββββββββββββββββββββ
β π Actual AWS Resources β
β β’ vpc-123456 (running) β
β β’ subnet-789 (running) β
β β’ i-abc123 (running) β
ββββββββββββββββββββββββββββββββββββββ
AWS Systems Manager (SSM) π§
Systems Manager provides unified visibility and control over AWS resources. It's crucial for automation, patch management, and configuration management.
Key SSM features for automation:
| Feature | Purpose | Use Case |
|---|---|---|
| Run Command | Execute commands remotely | Install software on 100 EC2 instances |
| State Manager | Maintain configuration state | Ensure CloudWatch agent is always installed |
| Automation | Multi-step workflows | Create AMI, patch, and restart instances |
| Parameter Store | Secure configuration data | Store database passwords, API keys |
| Session Manager | Secure shell access | SSH into instances without bastion hosts |
| Patch Manager | Automated patching | Apply security updates monthly |
SSM Automation document example:
schemaVersion: '0.3'
description: 'Create AMI and terminate instance'
parameters:
InstanceId:
type: String
description: 'EC2 Instance ID'
mainSteps:
- name: createImage
action: aws:createImage
inputs:
InstanceId: '{{ InstanceId }}'
ImageName: 'Backup-{{ global:DATE_TIME }}'
outputs:
- Name: ImageId
Selector: $.ImageId
Type: String
- name: waitForImage
action: aws:waitForAwsResourceProperty
inputs:
Service: ec2
Api: DescribeImages
ImageIds:
- '{{ createImage.ImageId }}'
PropertySelector: '$.Images[0].State'
DesiredValues:
- available
- name: terminateInstance
action: aws:executeAwsApi
inputs:
Service: ec2
Api: TerminateInstances
InstanceIds:
- '{{ InstanceId }}'
CI/CD for Infrastructure π
Continuous Integration/Continuous Deployment (CI/CD) for infrastructure means automatically testing and deploying infrastructure changes through a pipeline.
Infrastructure CI/CD pipeline stages:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β INFRASTRUCTURE CI/CD PIPELINE β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Code Push (Git)
β
β
βββββββββββββββββββ
β 1. VALIDATE β β Syntax check (terraform validate)
β β β Linting (tflint, cfn-lint)
ββββββββββ¬βββββββββ
β
β
βββββββββββββββββββ
β 2. SECURITY β β Scan for secrets (git-secrets)
β SCAN β β Policy check (Checkov, tfsec)
ββββββββββ¬βββββββββ β Compliance (AWS Config Rules)
β
β
βββββββββββββββββββ
β 3. PLAN β β Generate change preview
β β β Cost estimation (Infracost)
ββββββββββ¬βββββββββ β Manual approval gate
β
β
βββββββββββββββββββ
β 4. DEPLOY DEV β β Apply to dev environment
β β β Integration tests
ββββββββββ¬βββββββββ
β
β
βββββββββββββββββββ
β 5. DEPLOY PROD β β Blue/green deployment
β β β Rollback on failure
βββββββββββββββββββ
AWS CodePipeline for IaC example:
version: 0.2
phases:
install:
commands:
- echo "Installing Terraform"
- wget https://releases.hashicorp.com/terraform/1.5.0/terraform_1.5.0_linux_amd64.zip
- unzip terraform_1.5.0_linux_amd64.zip
- mv terraform /usr/local/bin/
pre_build:
commands:
- echo "Initializing Terraform"
- terraform init -backend-config="bucket=${STATE_BUCKET}"
- echo "Validating Terraform configuration"
- terraform validate
- echo "Running security scan"
- tfsec .
build:
commands:
- echo "Planning Terraform changes"
- terraform plan -out=tfplan
- echo "Applying Terraform changes"
- terraform apply -auto-approve tfplan
post_build:
commands:
- echo "Deployment complete"
- terraform output -json > outputs.json
artifacts:
files:
- outputs.json
- terraform.tfstate
GitOps workflow for infrastructure:
GitOps Infrastructure Deployment
Developer Git Repo CI/CD Pipeline AWS
π¨βπ» π¦ π€ βοΈ
β β β β
β 1. Push code β β β
ββββββββββββββββββββ β β
β β 2. Trigger β β
β ββββββββββββββββββββββ β
β β β 3. Validate β
β β βββββββββ β
β β βββββββββ β
β β β 4. Plan β
β β βββββββββ β
β β βββββββββ β
β β 5. Review PR β β
ββββββββββββββββββββ€ β β
β 6. Approve β β β
ββββββββββββββββββββ β β
β β 7. Merge β β
β ββββββββββββββββββββββ β
β β β 8. Deploy β
β β βββββββββββββββββ
β β β 9. Confirm β
β β βββββββββββββββββ€
β β 10. Update state β β
β ββββββββββββββββββββββ€ β
π‘ Best practice: Use separate AWS accounts for dev, staging, and production. Deploy to dev automatically, but require manual approval for production.
Examples π―
Example 1: Multi-Tier Application with CloudFormation
Let's create a complete web application infrastructure with VPC, subnets, load balancer, auto-scaling group, and RDS database.
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Multi-tier web application infrastructure'
Parameters:
EnvironmentName:
Type: String
Default: production
InstanceType:
Type: String
Default: t3.small
AllowedValues: [t3.micro, t3.small, t3.medium]
DBPassword:
Type: String
NoEcho: true
MinLength: 8
Description: 'RDS database password'
Resources:
# VPC and Networking
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 10.0.0.0/16
EnableDnsHostnames: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-vpc'
PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-public-1'
PublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.2.0/24
AvailabilityZone: !Select [1, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-public-2'
PrivateSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.11.0/24
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-private-1'
PrivateSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.12.0/24
AvailabilityZone: !Select [1, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-private-2'
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-igw'
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
# Application Load Balancer
ALB:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: !Sub '${EnvironmentName}-alb'
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
SecurityGroups:
- !Ref ALBSecurityGroup
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-alb'
ALBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow HTTP/HTTPS
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: !Sub '${EnvironmentName}-tg'
Port: 80
Protocol: HTTP
VpcId: !Ref VPC
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
ALBListener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
LoadBalancerArn: !Ref ALB
Port: 80
Protocol: HTTP
DefaultActions:
- Type: forward
TargetGroupArn: !Ref TargetGroup
# Auto Scaling Group
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: !Sub '${EnvironmentName}-lt'
LaunchTemplateData:
InstanceType: !Ref InstanceType
ImageId: !Sub '{{resolve:ssm:/aws/service/ami-amazon-linux-latest/amzn2-ami-hvm-x86_64-gp2}}'
SecurityGroupIds:
- !Ref WebServerSecurityGroup
UserData:
Fn::Base64: !Sub |
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from ${EnvironmentName}</h1>" > /var/www/html/index.html
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub '${EnvironmentName}-asg'
LaunchTemplate:
LaunchTemplateId: !Ref LaunchTemplate
Version: !GetAtt LaunchTemplate.LatestVersionNumber
MinSize: 2
MaxSize: 6
DesiredCapacity: 2
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
TargetGroupARNs:
- !Ref TargetGroup
HealthCheckType: ELB
HealthCheckGracePeriod: 300
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-web-server'
PropagateAtLaunch: true
WebServerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow traffic from ALB
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
SourceSecurityGroupId: !Ref ALBSecurityGroup
# RDS Database
DBSubnetGroup:
Type: AWS::RDS::DBSubnetGroup
Properties:
DBSubnetGroupDescription: Subnet group for RDS
SubnetIds:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
DBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Allow MySQL from web servers
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 3306
ToPort: 3306
SourceSecurityGroupId: !Ref WebServerSecurityGroup
Database:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: !Sub '${EnvironmentName}-db'
DBInstanceClass: db.t3.micro
Engine: mysql
EngineVersion: '8.0'
MasterUsername: admin
MasterUserPassword: !Ref DBPassword
AllocatedStorage: 20
DBSubnetGroupName: !Ref DBSubnetGroup
VPCSecurityGroups:
- !Ref DBSecurityGroup
BackupRetentionPeriod: 7
MultiAZ: true
StorageEncrypted: true
Outputs:
LoadBalancerDNS:
Description: DNS name of load balancer
Value: !GetAtt ALB.DNSName
Export:
Name: !Sub '${EnvironmentName}-alb-dns'
DatabaseEndpoint:
Description: RDS endpoint
Value: !GetAtt Database.Endpoint.Address
Export:
Name: !Sub '${EnvironmentName}-db-endpoint'
What this template does:
- Creates VPC with public/private subnets across 2 availability zones
- Sets up Internet Gateway for public subnet internet access
- Deploys Application Load Balancer in public subnets
- Creates Auto Scaling Group with 2-6 EC2 instances in private subnets
- Provisions RDS MySQL database with Multi-AZ for high availability
- Configures security groups to allow traffic flow: Internet β ALB β EC2 β RDS
π‘ Deploy this template: aws cloudformation create-stack --stack-name my-app --template-body file://template.yaml --parameters ParameterKey=DBPassword,ParameterValue=SecurePass123
Example 2: Terraform Modules for Reusability
Terraform modules let you package reusable infrastructure components. Here's how to create and use a VPC module.
File structure:
project/
βββ main.tf
βββ variables.tf
βββ outputs.tf
βββ modules/
βββ vpc/
βββ main.tf
βββ variables.tf
βββ outputs.tf
modules/vpc/variables.tf:
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "availability_zones" {
description = "List of AZs"
type = list(string)
}
modules/vpc/main.tf:
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
Environment = var.environment
}
}
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${count.index + 1}"
Type = "public"
}
}
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${count.index + 1}"
Type = "private"
}
}
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
}
}
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.environment}-public-rt"
}
}
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
modules/vpc/outputs.tf:
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "public_subnet_ids" {
description = "List of public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "List of private subnet IDs"
value = aws_subnet.private[*].id
}
Root main.tf (uses the module):
provider "aws" {
region = "us-east-1"
}
module "dev_vpc" {
source = "./modules/vpc"
vpc_cidr = "10.0.0.0/16"
environment = "development"
availability_zones = ["us-east-1a", "us-east-1b"]
}
module "prod_vpc" {
source = "./modules/vpc"
vpc_cidr = "10.1.0.0/16"
environment = "production"
availability_zones = ["us-east-1a", "us-east-1b", "us-east-1c"]
}
resource "aws_security_group" "app" {
name = "app-sg"
description = "Application security group"
vpc_id = module.prod_vpc.vpc_id
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
output "dev_vpc_id" {
value = module.dev_vpc.vpc_id
}
output "prod_public_subnets" {
value = module.prod_vpc.public_subnet_ids
}
Benefits of this modular approach:
- Reusability: Same VPC code creates dev and prod environments
- Consistency: Both environments follow identical patterns
- Maintainability: Update module once, affects all uses
- Testing: Test module independently before using in production
π§ Mnemonic for Terraform commands: "I Plan to Apply Destruction" β Init, Plan, Apply, Destroy
Example 3: SSM Automation with Parameter Store
Use Parameter Store for configuration management and SSM Automation for deployment workflows.
Store configuration in Parameter Store:
## Store database connection string
aws ssm put-parameter \
--name "/myapp/prod/db-connection" \
--value "mysql://db.example.com:3306/mydb" \
--type "String" \
--description "Production database connection"
## Store encrypted API key
aws ssm put-parameter \
--name "/myapp/prod/api-key" \
--value "sk-1234567890abcdef" \
--type "SecureString" \
--key-id "alias/aws/ssm" \
--description "Production API key"
## Store configuration JSON
aws ssm put-parameter \
--name "/myapp/prod/config" \
--value '{"timeout":30,"retries":3,"log_level":"info"}' \
--type "String"
Retrieve parameters in application code:
import boto3
import json
ssm = boto3.client('ssm', region_name='us-east-1')
## Get single parameter
response = ssm.get_parameter(
Name='/myapp/prod/db-connection',
WithDecryption=False
)
db_connection = response['Parameter']['Value']
## Get encrypted parameter
response = ssm.get_parameter(
Name='/myapp/prod/api-key',
WithDecryption=True # Decrypt SecureString
)
api_key = response['Parameter']['Value']
## Get multiple parameters at once
response = ssm.get_parameters_by_path(
Path='/myapp/prod/',
Recursive=True,
WithDecryption=True
)
config = {}
for param in response['Parameters']:
key = param['Name'].split('/')[-1] # Get last part of path
config[key] = param['Value']
print(json.dumps(config, indent=2))
SSM Automation document for blue/green deployment:
schemaVersion: '0.3'
description: 'Blue/Green deployment with rollback'
assumptions:
- Blue environment is currently serving traffic
- Green environment is deployed with new version
parameters:
BlueASGName:
type: String
description: 'Blue Auto Scaling Group name'
GreenASGName:
type: String
description: 'Green Auto Scaling Group name'
TargetGroupArn:
type: String
description: 'ALB Target Group ARN'
HealthCheckUrl:
type: String
description: 'Health check endpoint'
default: 'http://example.com/health'
mainSteps:
# Step 1: Scale up Green environment
- name: ScaleGreen
action: aws:executeAwsApi
inputs:
Service: autoscaling
Api: SetDesiredCapacity
AutoScalingGroupName: '{{ GreenASGName }}'
DesiredCapacity: 2
# Step 2: Wait for Green instances to be healthy
- name: WaitForGreenHealthy
action: aws:waitForAwsResourceProperty
timeoutSeconds: 600
inputs:
Service: autoscaling
Api: DescribeAutoScalingGroups
AutoScalingGroupNames:
- '{{ GreenASGName }}'
PropertySelector: '$.AutoScalingGroups[0].Instances[?(@.HealthStatus=="Healthy")]'
DesiredValues:
- '2' # Wait for 2 healthy instances
# Step 3: Perform health check
- name: HealthCheckGreen
action: aws:executeScript
inputs:
Runtime: python3.8
Handler: health_check
Script: |
import urllib.request
import json
def health_check(events, context):
url = events['HealthCheckUrl']
try:
response = urllib.request.urlopen(url, timeout=10)
if response.status == 200:
return {'status': 'healthy'}
else:
raise Exception(f'Health check failed: {response.status}')
except Exception as e:
raise Exception(f'Health check error: {str(e)}')
InputPayload:
HealthCheckUrl: '{{ HealthCheckUrl }}'
# Step 4: Switch traffic to Green
- name: SwitchToGreen
action: aws:executeAwsApi
inputs:
Service: elbv2
Api: RegisterTargets
TargetGroupArn: '{{ TargetGroupArn }}'
Targets:
- Id: '{{ GreenASGName }}'
# Step 5: Deregister Blue targets
- name: DeregisterBlue
action: aws:executeAwsApi
inputs:
Service: elbv2
Api: DeregisterTargets
TargetGroupArn: '{{ TargetGroupArn }}'
Targets:
- Id: '{{ BlueASGName }}'
# Step 6: Wait for connection draining
- name: WaitForDraining
action: aws:sleep
inputs:
Duration: PT5M # 5 minutes
# Step 7: Scale down Blue
- name: ScaleDownBlue
action: aws:executeAwsApi
inputs:
Service: autoscaling
Api: SetDesiredCapacity
AutoScalingGroupName: '{{ BlueASGName }}'
DesiredCapacity: 0
outputs:
- DeploymentComplete
Execute the automation:
aws ssm start-automation-execution \
--document-name "BlueGreenDeployment" \
--parameters \
BlueASGName=my-app-blue-asg,\
GreenASGName=my-app-green-asg,\
TargetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789012:targetgroup/my-tg/abc123,\
HealthCheckUrl=https://green.example.com/health
Example 4: Complete CI/CD Pipeline with GitHub Actions
Implement a complete infrastructure pipeline using GitHub Actions, Terraform, and AWS.
.github/workflows/terraform.yml:
name: 'Terraform Infrastructure Pipeline'
on:
push:
branches:
- main
- develop
pull_request:
branches:
- main
env:
AWS_REGION: us-east-1
TERRAFORM_VERSION: 1.5.0
jobs:
validate:
name: 'Validate Terraform'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TERRAFORM_VERSION }}
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
run: terraform init -backend=false
- name: Terraform Validate
run: terraform validate
security:
name: 'Security Scan'
runs-on: ubuntu-latest
needs: validate
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Run tfsec
uses: aquasecurity/tfsec-action@v1.0.0
with:
soft_fail: false
- name: Run Checkov
uses: bridgecrewio/checkov-action@master
with:
directory: .
framework: terraform
quiet: false
soft_fail: false
plan:
name: 'Terraform Plan'
runs-on: ubuntu-latest
needs: [validate, security]
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TERRAFORM_VERSION }}
- name: Terraform Init
run: |
terraform init \
-backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
-backend-config="key=infrastructure/terraform.tfstate" \
-backend-config="region=${{ env.AWS_REGION }}"
- name: Terraform Plan
id: plan
run: |
terraform plan -out=tfplan -no-color | tee plan.txt
echo "::set-output name=exitcode::${PIPESTATUS[0]}"
- name: Estimate costs
uses: infracost/infracost-action@v1
with:
apikey: ${{ secrets.INFRACOST_API_KEY }}
path: .
- name: Comment PR with plan
if: github.event_name == 'pull_request'
uses: actions/github-script@v6
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const fs = require('fs');
const plan = fs.readFileSync('plan.txt', 'utf8');
const output = `#### Terraform Plan π
<details>
<summary>Show Plan</summary>
\`\`\`terraform
${plan}
\`\`\`
</details>
*Pushed by: @${{ github.actor }}*`;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: output
});
- name: Upload plan artifact
uses: actions/upload-artifact@v3
with:
name: tfplan
path: tfplan
apply-dev:
name: 'Apply to Development'
runs-on: ubuntu-latest
needs: plan
if: github.ref == 'refs/heads/develop' && github.event_name == 'push'
environment:
name: development
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TERRAFORM_VERSION }}
- name: Download plan
uses: actions/download-artifact@v3
with:
name: tfplan
- name: Terraform Init
run: |
terraform init \
-backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
-backend-config="key=infrastructure/terraform.tfstate" \
-backend-config="region=${{ env.AWS_REGION }}"
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
- name: Run integration tests
run: |
# Add your integration test commands here
echo "Running integration tests..."
# npm test or pytest or equivalent
apply-prod:
name: 'Apply to Production'
runs-on: ubuntu-latest
needs: plan
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment:
name: production
url: https://prod.example.com
steps:
- name: Checkout code
uses: actions/checkout@v3
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v2
with:
aws-access-key-id: ${{ secrets.PROD_AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.PROD_AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ env.AWS_REGION }}
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: ${{ env.TERRAFORM_VERSION }}
- name: Terraform Init
run: |
terraform init \
-backend-config="bucket=${{ secrets.PROD_TF_STATE_BUCKET }}" \
-backend-config="key=infrastructure/terraform.tfstate" \
-backend-config="region=${{ env.AWS_REGION }}"
- name: Download plan
uses: actions/download-artifact@v3
with:
name: tfplan
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
- name: Smoke tests
run: |
echo "Running smoke tests..."
curl -f https://prod.example.com/health || exit 1
- name: Notify success
if: success()
run: |
echo "Production deployment successful!"
# Add Slack notification or similar
- name: Rollback on failure
if: failure()
run: |
echo "Deployment failed, initiating rollback..."
# Add rollback logic
This pipeline implements:
- Validation stage: Format check, syntax validation
- Security scanning: tfsec and Checkov for policy violations
- Plan generation: Creates execution plan with cost estimates
- PR comments: Posts plan details on pull requests
- Environment-specific deployment: Dev auto-deploys, prod requires manual approval
- Integration testing: Runs tests after dev deployment
- Smoke testing: Validates production deployment
- Rollback capability: Handles deployment failures
Common Mistakes β οΈ
1. Hardcoding Values Instead of Using Parameters
β Wrong:
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: my-production-bucket-2024
β Right:
Parameters:
EnvironmentName:
Type: String
Resources:
MyBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: !Sub '${EnvironmentName}-bucket-${AWS::AccountId}'
Why: Hardcoded values make templates inflexible and can't be reused across environments.
2. Not Using Remote State in Terraform
β Wrong: Using local terraform.tfstate file
β Right:
terraform {
backend "s3" {
bucket = "my-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
dynamodb_table = "terraform-locks"
encrypt = true
}
}
Why: Local state doesn't work for teams and risks state corruption. S3 + DynamoDB provides locking and consistency.
3. Forgetting Resource Dependencies
β Wrong:
resource "aws_instance" "web" {
subnet_id = aws_subnet.public.id
# Might be created before VPC is ready!
}
resource "aws_subnet" "public" {
vpc_id = aws_vpc.main.id
}
β Right:
resource "aws_instance" "web" {
subnet_id = aws_subnet.public.id
depends_on = [
aws_internet_gateway.main
]
}
Why: Terraform usually figures out dependencies, but sometimes explicit depends_on is needed.
4. Not Implementing State Locking
β Wrong: Multiple team members running terraform apply simultaneously
β Right: Use DynamoDB table for state locking:
aws dynamodb create-table \
--table-name terraform-locks \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
Why: Without locking, concurrent applies corrupt state and create duplicate resources.
5. Mixing Manual Changes with IaC
β Wrong: Creating infrastructure with Terraform, then manually modifying resources in console
β Right: All changes through code:
## Detect drift
terraform plan
## Import manually created resources
terraform import aws_instance.web i-1234567890abcdef0
## Or remove from state if shouldn't be managed
terraform state rm aws_instance.web
Why: Manual changes cause "drift" where actual infrastructure doesn't match code. Terraform may overwrite manual changes.
6. Not Using Version Constraints
β Wrong:
terraform {
required_version = ">= 0.12"
}
provider "aws" {
region = "us-east-1"
}
β Right:
terraform {
required_version = "~> 1.5.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
Why: Without version pinning, provider updates can break your code unexpectedly.
7. Not Testing Before Production
β Wrong: Applying directly to production
β Right: Use workspaces or separate state files:
## Using workspaces
terraform workspace new dev
terraform apply
terraform workspace new staging
terraform apply
terraform workspace new prod
terraform apply # Only after testing!
Why: Infrastructure changes can have catastrophic effects. Always test in non-production first.
8. Ignoring Security Scanning
β Wrong: Deploying without security checks
β Right: Integrate security tools:
## Scan Terraform code
tfsec .
## Check for policy violations
checkov -d .
## Scan CloudFormation
cfn-lint template.yaml
Why: Security issues in infrastructure code can expose your entire environment.
Key Takeaways π―
β Infrastructure as Code treats infrastructure like application code: versioned, reviewed, and tested
β CloudFormation is AWS-native IaC with deep AWS integration
β Terraform offers multi-cloud support and a larger ecosystem
β Always use remote state (S3 + DynamoDB) for team collaboration
β Parameters and variables make templates reusable across environments
β Modules enable code reuse and standardization
β CI/CD pipelines automate validation, security scanning, and deployment
β Test in dev/staging before touching production
β Use change sets/plan to preview infrastructure changes
β Implement security scanning with tfsec, Checkov, cfn-lint
β Avoid manual changes - keep everything in code
β Version constraints prevent breaking changes from provider updates
π Quick Reference Card: IaC Commands
| Tool | Command | Purpose |
|---|---|---|
| CloudFormation | aws cloudformation create-stack | Create new stack |
aws cloudformation update-stack | Update existing stack | |
aws cloudformation create-change-set | Preview changes | |
aws cloudformation delete-stack | Delete stack | |
| Terraform | terraform init | Initialize providers |
terraform plan | Preview changes | |
terraform apply | Execute changes | |
terraform destroy | Delete resources | |
terraform state list | List managed resources | |
| SSM | aws ssm put-parameter | Store parameter |
aws ssm get-parameter | Retrieve parameter | |
aws ssm start-automation-execution | Run automation |
Best Practice Checklist:
- β Store state remotely (S3 + DynamoDB)
- β Use variables/parameters for flexibility
- β Implement CI/CD pipeline
- β Run security scans (tfsec, Checkov)
- β Test in dev before prod
- β Version control everything
- β Use modules for reusability
- β Document your infrastructure
Further Study π
AWS CloudFormation Documentation: https://docs.aws.amazon.com/cloudformation/ - Official CloudFormation user guide with template reference
Terraform Documentation: https://developer.hashicorp.com/terraform/docs - Complete Terraform documentation including tutorials and best practices
AWS Systems Manager Documentation: https://docs.aws.amazon.com/systems-manager/ - Guide for SSM automation, Parameter Store, and Run Command
π Practice tip: Build a complete three-tier application using IaC, deploy it to dev, then promote to production using a CI/CD pipeline. This hands-on experience solidifies all concepts covered in this lesson!