Complete the DynamoDB auto-scaling configuration: ```python application_autoscaling.put_scaling_policy( PolicyType='{{1}}', TargetTrackingScalingPolicyConfiguration={ 'TargetValue': 70.0, 'PredefinedMetricSpecification': { 'PredefinedMetricType': '{{2}}' } } ) ```

["TargetTrackingScaling","DynamoDBReadCapacityUtilization"]

Cost & Performance Engineering

FinOps, cost allocation, tagging, rightsizing, and designing for failure

AWS Cost & Performance Engineering

Master AWS cost optimization and performance engineering with free flashcards and spaced repetition practice. This lesson covers cost monitoring strategies, right-sizing resources, performance tuning techniques, and architectural optimization patterns—essential concepts for AWS Solutions Architect and SysOps Administrator certifications.

Welcome to Cost & Performance Engineering 💰⚡

In cloud computing, cost optimization and performance engineering are two sides of the same coin. Organizations often overprovision resources "just to be safe," resulting in wasted spend, or undersize resources to save money, creating performance bottlenecks. The art of AWS cost and performance engineering lies in finding the sweet spot where your applications run efficiently at the lowest sustainable cost.

Why This Matters:

Companies waste an average of 35% of their cloud spending on unused or oversized resources
Poor performance impacts user experience and business revenue
Well-architected systems achieve both cost efficiency and high performance
AWS offers over 200 services—choosing the right combination saves money and boosts speed

💡 Pro Tip: The AWS Well-Architected Framework includes both Cost Optimization and Performance Efficiency as two of its five pillars. Mastering both makes you invaluable to any organization.

Core Concepts: Understanding AWS Cost Structure 💵

1. AWS Pricing Models

AWS offers several pricing models, each suited to different workload patterns:

Pricing Model	Best For	Savings vs On-Demand	Commitment
On-Demand	Variable workloads, testing	Baseline (0%)	None
Reserved Instances	Steady-state workloads	Up to 72%	1-3 years
Savings Plans	Flexible compute usage	Up to 66%	1-3 years
Spot Instances	Fault-tolerant, flexible workloads	Up to 90%	None (can be interrupted)

Reserved Instances (RIs) provide the deepest discounts but require upfront commitment to specific instance families and regions. Think of them like buying a gym membership—you pay upfront for guaranteed access.

Savings Plans offer similar discounts but with more flexibility. You commit to a dollar amount per hour (e.g., $10/hour) rather than specific instance types. The plan automatically applies to any matching compute usage.

Spot Instances leverage AWS's unused capacity. AWS can reclaim them with 2-minute notice, making them perfect for batch processing, CI/CD, data analysis, and containerized workloads with checkpointing.

## Example: Launching a Spot Instance with boto3
import boto3

ec2 = boto3.client('ec2')

response = ec2.request_spot_instances(
    SpotPrice='0.05',
    InstanceCount=1,
    Type='one-time',
    LaunchSpecification={
        'ImageId': 'ami-0abcdef1234567890',
        'InstanceType': 't3.medium',
        'KeyName': 'my-key-pair',
        'SecurityGroupIds': ['sg-0123456789abcdef0'],
        'SubnetId': 'subnet-12345678',
        'IamInstanceProfile': {
            'Name': 'MyInstanceProfile'
        }
    }
)

print(f"Spot request ID: {response['SpotInstanceRequests'][0]['SpotInstanceRequestId']}")

🧠 Memory Device - ROSS: Reserved for steady loads, On-demand for flexibility, Savings Plans for compute flexibility, Spot for interruptible workloads.

2. Cost Monitoring & Visibility Tools 📊

AWS Cost Explorer provides visual analysis of spending patterns over time. It shows:

Historical cost data (up to 12 months)
Forecasting for next 12 months
Cost breakdown by service, linked account, tag, or region
Reserved Instance recommendations

AWS Budgets creates custom alerts when costs or usage exceed thresholds:

{
  "Budget": {
    "BudgetName": "Monthly-EC2-Budget",
    "BudgetLimit": {
      "Amount": "1000",
      "Unit": "USD"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostFilters": {
      "Service": ["Amazon Elastic Compute Cloud - Compute"]
    }
  },
  "NotificationsWithSubscribers": [
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {
          "SubscriptionType": "EMAIL",
          "Address": "ops-team@example.com"
        }
      ]
    }
  ]
}

AWS Cost and Usage Report (CUR) delivers the most granular data—hourly resource usage with custom tags. It writes to S3 for analysis with Athena, QuickSight, or third-party tools.

AWS Trusted Advisor provides real-time recommendations across five categories, including cost optimization:

Low utilization EC2 instances
Idle RDS databases
Unassociated Elastic IPs
Underutilized EBS volumes

┌────────────────────────────────────────────────┐
│      COST MONITORING WORKFLOW                  │
└────────────────────────────────────────────────┘

    📊 Cost Explorer
         │
         ↓
    🎯 Identify high-spend services
         │
         ↓
    🔍 Drill down by tag/resource
         │
         ↓
    💡 Trusted Advisor checks
         │
         ↓
    📋 Create optimization plan
         │
         ↓
    🔔 Set Budget alerts
         │
         ↓
    🔄 Monitor CUR in Athena
         │
         ↓
    ✅ Implement changes
         │
         ↓
    📈 Measure impact

💡 Pro Tip: Tag everything! Use tags like Environment, Project, Owner, and CostCenter to enable detailed cost allocation. AWS supports up to 50 tags per resource.

3. Right-Sizing Resources 📏

Right-sizing means matching instance types and sizes to actual workload requirements. Most organizations overprovision by 30-50%.

CloudWatch Metrics reveal actual resource utilization:

## Check average CPU utilization over 14 days
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
    ],
    StartTime=datetime.utcnow() - timedelta(days=14),
    EndTime=datetime.utcnow(),
    Period=3600,  # 1 hour intervals
    Statistics=['Average', 'Maximum']
)

avg_cpu = sum([d['Average'] for d in response['Datapoints']]) / len(response['Datapoints'])
max_cpu = max([d['Maximum'] for d in response['Datapoints']])

print(f"Average CPU: {avg_cpu:.2f}%")
print(f"Maximum CPU: {max_cpu:.2f}%")

if avg_cpu < 10:
    print("⚠️ Consider downsizing or stopping this instance")
elif avg_cpu > 70:
    print("⚠️ Consider upsizing to prevent performance issues")

AWS Compute Optimizer uses machine learning to analyze historical utilization and recommend optimal instance types:

## Get recommendations via AWS CLI
aws compute-optimizer get-ec2-instance-recommendations \
  --instance-arns arn:aws:ec2:us-east-1:123456789012:instance/i-0123456789abcdef0

Output includes:

Current instance type and pricing
Recommended instance types (up to 3 options)
Projected savings
Performance risk assessment (Very Low, Low, Medium, High)

Metric	Target Range	Action if Below	Action if Above
CPU	40-70%	Downsize	Upsize or scale out
Memory	50-80%	Switch to compute-optimized	Switch to memory-optimized
Network	30-60%	Consider smaller instance	Enable enhanced networking
Disk I/O	40-70%	Use gp3 instead of io2	Increase IOPS or use io2

4. Storage Optimization Strategies 💾

S3 Storage Classes offer tiered pricing based on access patterns:

Storage Class	Use Case	Availability	Cost (relative)
S3 Standard	Frequently accessed data	99.99%	$$$$
S3 Intelligent-Tiering	Unknown/changing access	99.9%	$$$$ (auto-optimized)
S3 Standard-IA	Infrequent access	99.9%	$$$
S3 One Zone-IA	Infrequent, reproducible	99.5%	$$
S3 Glacier Instant	Archive, instant retrieval	99.9%	$$
S3 Glacier Flexible	Archive, 1-5 min retrieval	99.99%	$
S3 Glacier Deep Archive	Long-term archive, 12hr retrieval	99.99%	¢

S3 Lifecycle Policies automatically transition objects between classes:

{
  "Rules": [
    {
      "Id": "LogArchivalPolicy",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

EBS Volume Optimization:

## Identify unattached EBS volumes (wasted cost)
import boto3

ec2 = boto3.client('ec2')

volumes = ec2.describe_volumes(
    Filters=[{'Name': 'status', 'Values': ['available']}]
)['Volumes']

total_cost = 0
for vol in volumes:
    size_gb = vol['Size']
    vol_type = vol['VolumeType']
    
    # Approximate monthly cost (gp3 = $0.08/GB)
    monthly_cost = size_gb * 0.08
    total_cost += monthly_cost
    
    print(f"Volume {vol['VolumeId']}: {size_gb}GB {vol_type} = ${monthly_cost:.2f}/month")

print(f"\nTotal wasted spend on unattached volumes: ${total_cost:.2f}/month")

gp3 vs gp2: gp3 volumes are 20% cheaper than gp2 and provide better performance. You can provision IOPS and throughput independently.

## Modify gp2 volume to gp3
aws ec2 modify-volume \
  --volume-id vol-0123456789abcdef0 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125

💡 Did You Know? S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns with no retrieval fees. It's perfect when you can't predict access patterns.

Performance Engineering Fundamentals ⚡

5. Compute Performance Optimization

Instance Family Selection:

Family	Code	Optimized For	Example Use Cases
General Purpose	T, M, A	Balanced CPU/memory	Web servers, small databases
Compute Optimized	C	High CPU performance	Batch processing, gaming, HPC
Memory Optimized	R, X, Z	Large in-memory datasets	Caching, in-memory databases, big data
Storage Optimized	I, D, H	High disk I/O	NoSQL databases, data warehouses
Accelerated Computing	P, G, F, Inf	GPU/FPGA workloads	ML training, graphics rendering

Burstable Instances (T-series) accumulate CPU credits when idle and consume them during bursts:

## Monitor T-instance CPU credit balance
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUCreditBalance',
    Dimensions=[{'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}],
    StartTime=datetime.utcnow() - timedelta(hours=24),
    EndTime=datetime.utcnow(),
    Period=3600,
    Statistics=['Average']
)

current_credits = response['Datapoints'][-1]['Average'] if response['Datapoints'] else 0
print(f"Current CPU Credit Balance: {current_credits:.0f}")

if current_credits < 50:
    print("⚠️ Low credit balance! Consider switching to unlimited mode or M-series")

Enhanced Networking provides higher bandwidth, higher packet-per-second performance, and lower latency:

## Enable ENA (Elastic Network Adapter) on an instance
aws ec2 modify-instance-attribute \
  --instance-id i-0123456789abcdef0 \
  --ena-support

Placement Groups control instance placement for performance:

Cluster: Low latency, high throughput (same AZ)
Partition: Isolated failure domains (different hardware)
Spread: Each instance on separate hardware (max 7 per AZ)

## Create a cluster placement group
import boto3

ec2 = boto3.client('ec2')

ec2.create_placement_group(
    GroupName='my-hpc-cluster',
    Strategy='cluster'
)

## Launch instances into the placement group
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='c5n.18xlarge',
    MinCount=3,
    MaxCount=3,
    Placement={
        'GroupName': 'my-hpc-cluster'
    }
)

6. Database Performance Tuning 🗄️

RDS Performance Insights identifies performance bottlenecks:

## Enable Performance Insights on RDS instance
import boto3

rds = boto3.client('rds')

rds.modify_db_instance(
    DBInstanceIdentifier='my-database',
    EnablePerformanceInsights=True,
    PerformanceInsightsRetentionPeriod=7  # days
)

Performance Insights shows:

Top SQL queries by load
Wait events (I/O, CPU, locks)
Database load over time

Read Replicas offload read traffic from primary:

## Create read replica
rds.create_db_instance_read_replica(
    DBInstanceIdentifier='my-db-replica',
    SourceDBInstanceIdentifier='my-database',
    DBInstanceClass='db.r5.large',
    AvailabilityZone='us-east-1b',
    PubliclyAccessible=False
)

Aurora Serverless v2 auto-scales compute capacity:

## Create Aurora Serverless v2 cluster
rds.create_db_cluster(
    DBClusterIdentifier='my-serverless-cluster',
    Engine='aurora-mysql',
    EngineVersion='8.0.mysql_aurora.3.02.0',
    ServerlessV2ScalingConfiguration={
        'MinCapacity': 0.5,  # ACUs (Aurora Capacity Units)
        'MaxCapacity': 2.0
    },
    MasterUsername='admin',
    MasterUserPassword='SecurePassword123!',
    DatabaseName='myapp'
)

DynamoDB Performance:

## Enable auto-scaling for DynamoDB table
import boto3

application_autoscaling = boto3.client('application-autoscaling')

## Register table as scalable target
application_autoscaling.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/my-table',
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    MinCapacity=5,
    MaxCapacity=100
)

## Create scaling policy
application_autoscaling.put_scaling_policy(
    PolicyName='my-table-read-scaling',
    ServiceNamespace='dynamodb',
    ResourceId='table/my-table',
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 70.0,  # Target 70% utilization
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
        },
        'ScaleOutCooldown': 60,
        'ScaleInCooldown': 60
    }
)

DynamoDB Accelerator (DAX) provides in-memory caching:

## Create DAX cluster
import boto3

dax = boto3.client('dax')

dax.create_cluster(
    ClusterName='my-dax-cluster',
    NodeType='dax.r5.large',
    ReplicationFactor=3,
    IamRoleArn='arn:aws:iam::123456789012:role/DAXServiceRole',
    SubnetGroupName='my-subnet-group'
)

DAX reduces read latency from milliseconds to microseconds for repeated queries.

7. Network Performance Optimization 🌐

CloudFront CDN caches content at edge locations:

## Create CloudFront distribution
import boto3

cloudfront = boto3.client('cloudfront')

distribution = cloudfront.create_distribution(
    DistributionConfig={
        'CallerReference': 'my-distribution-2024',
        'Origins': {
            'Quantity': 1,
            'Items': [
                {
                    'Id': 's3-origin',
                    'DomainName': 'my-bucket.s3.amazonaws.com',
                    'S3OriginConfig': {
                        'OriginAccessIdentity': ''
                    }
                }
            ]
        },
        'DefaultCacheBehavior': {
            'TargetOriginId': 's3-origin',
            'ViewerProtocolPolicy': 'redirect-to-https',
            'AllowedMethods': {
                'Quantity': 2,
                'Items': ['GET', 'HEAD']
            },
            'Compress': True,
            'MinTTL': 0,
            'DefaultTTL': 86400,  # 24 hours
            'MaxTTL': 31536000,   # 1 year
            'ForwardedValues': {
                'QueryString': False,
                'Cookies': {'Forward': 'none'}
            }
        },
        'Enabled': True
    }
)

Global Accelerator uses AWS global network:

## Create Global Accelerator
globalaccelerator = boto3.client('globalaccelerator')

accelerator = globalaccelerator.create_accelerator(
    Name='my-accelerator',
    IpAddressType='IPV4',
    Enabled=True
)

## Add listener
listener = globalaccelerator.create_listener(
    AcceleratorArn=accelerator['Accelerator']['AcceleratorArn'],
    PortRanges=[{'FromPort': 80, 'ToPort': 80}],
    Protocol='TCP'
)

Global Accelerator improves performance by:

Routing traffic over AWS backbone (not public internet)
Providing static anycast IPs
Automatic failover to healthy endpoints

┌────────────────────────────────────────────────┐
│   CONTENT DELIVERY DECISION TREE               │
└────────────────────────────────────────────────┘

         Need caching?
              │
      ┌───────┴────────┐
      │                │
     YES               NO
      │                │
      ↓                ↓
  Static content?   Real-time?
      │                │
  ┌───┴───┐        ┌───┴───┐
 YES     NO       YES     NO
  │      │         │       │
  ↓      ↓         ↓       ↓
CloudFront  ┐   Global    Direct
            │   Accelerator  to
         API ┘              origin
        Gateway
       (caching)

Practical Examples with Real-World Scenarios 🔧

Example 1: E-commerce Platform Cost Optimization

Scenario: An e-commerce company runs 50 m5.large EC2 instances 24/7 for their web tier. Monthly cost is $3,650.

Analysis:

Traffic peaks during business hours (9am-9pm)
Average CPU utilization: 25%
Peak CPU: 60%

Optimization Strategy:

## Step 1: Analyze current usage
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

## Get average CPU across all instances
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    Dimensions=[{'Name': 'AutoScalingGroupName', 'Value': 'web-tier-asg'}],
    StartTime=datetime.utcnow() - timedelta(days=30),
    EndTime=datetime.utcnow(),
    Period=3600,
    Statistics=['Average', 'Maximum']
)

## Step 2: Implement changes
ec2 = boto3.client('ec2')
autoscaling = boto3.client('autoscaling')

## Purchase Reserved Instances for baseline (20 instances)
ec2.purchase_reserved_instances_offering(
    InstanceCount=20,
    ReservedInstancesOfferingId='offering-12345678',  # m5.large, 1-year, partial upfront
)

## Configure Auto Scaling for variable load (10-40 instances)
autoscaling.put_scaling_policy(
    AutoScalingGroupName='web-tier-asg',
    PolicyName='target-tracking-scaling',
    PolicyType='TargetTrackingScaling',
    TargetTrackingConfiguration={
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'ASGAverageCPUUtilization'
        },
        'TargetValue': 50.0
    }
)

## Step 3: Add Spot Instances for additional capacity
autoscaling.create_auto_scaling_group(
    AutoScalingGroupName='web-tier-spot-asg',
    MixedInstancesPolicy={
        'InstancesDistribution': {
            'OnDemandBaseCapacity': 0,
            'OnDemandPercentageAboveBaseCapacity': 20,  # 20% on-demand, 80% spot
            'SpotAllocationStrategy': 'capacity-optimized'
        },
        'LaunchTemplate': {
            'LaunchTemplateSpecification': {
                'LaunchTemplateId': 'lt-0123456789abcdef0',
                'Version': '$Latest'
            },
            'Overrides': [
                {'InstanceType': 'm5.large'},
                {'InstanceType': 'm5a.large'},  # Alternative for better spot availability
                {'InstanceType': 'm5n.large'}
            ]
        }
    },
    MinSize=10,
    MaxSize=40,
    DesiredCapacity=20,
    VPCZoneIdentifier='subnet-1,subnet-2,subnet-3'
)

Results:

20 Reserved Instances (baseline): $1,460/month (60% savings)
10-30 Auto Scaling (mix of on-demand/spot): ~$800/month average
Total cost: ~$2,260/month (38% reduction)
Annual savings: $16,680

Example 2: Media Processing Pipeline Performance

Scenario: Video transcoding pipeline processes 1,000 videos/day. Current processing time: 4 hours average per batch.

Bottleneck Analysis:

Using c5.2xlarge instances (8 vCPU)
Single-threaded processing
No parallelization

Optimization:

## Implement parallel processing with AWS Batch
import boto3
import json

batch = boto3.client('batch')
s3 = boto3.client('s3')

## Define compute environment with Spot instances
compute_env = batch.create_compute_environment(
    computeEnvironmentName='video-transcoding-spot',
    type='MANAGED',
    state='ENABLED',
    computeResources={
        'type': 'SPOT',
        'allocationStrategy': 'SPOT_CAPACITY_OPTIMIZED',
        'minvCpus': 0,
        'maxvCpus': 256,
        'desiredvCpus': 0,
        'instanceTypes': ['c5', 'c5n', 'c5a'],  # Multiple types for better availability
        'subnets': ['subnet-1', 'subnet-2', 'subnet-3'],
        'securityGroupIds': ['sg-0123456789abcdef0'],
        'instanceRole': 'arn:aws:iam::123456789012:instance-profile/ecsInstanceRole',
        'bidPercentage': 70,  # Pay up to 70% of on-demand price
        'spotIamFleetRole': 'arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-role'
    }
)

## Create job queue
job_queue = batch.create_job_queue(
    jobQueueName='video-transcoding-queue',
    state='ENABLED',
    priority=100,
    computeEnvironmentOrder=[
        {
            'order': 1,
            'computeEnvironment': 'video-transcoding-spot'
        }
    ]
)

## Define job that processes single video
job_definition = batch.register_job_definition(
    jobDefinitionName='transcode-video',
    type='container',
    containerProperties={
        'image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/video-transcoder:latest',
        'vcpus': 4,
        'memory': 8192,
        'command': ['python', 'transcode.py', 'Ref::input_file', 'Ref::output_file'],
        'jobRoleArn': 'arn:aws:iam::123456789012:role/BatchJobRole'
    },
    retryStrategy={
        'attempts': 3,
        'evaluateOnExit': [
            {
                'onStatusReason': 'Host EC2*',  # Retry on spot interruption
                'action': 'RETRY'
            }
        ]
    }
)

## Submit jobs in parallel
videos = s3.list_objects_v2(Bucket='input-videos', Prefix='pending/')['Contents']

for video in videos:
    batch.submit_job(
        jobName=f"transcode-{video['Key'].split('/')[-1]}",
        jobQueue='video-transcoding-queue',
        jobDefinition='transcode-video',
        containerOverrides={
            'command': [
                'python', 'transcode.py',
                f"s3://input-videos/{video['Key']}",
                f"s3://output-videos/{video['Key']}"
            ]
        }
    )

print(f"Submitted {len(videos)} transcoding jobs")

Results:

Processing time: 4 hours → 30 minutes (8x faster)
Cost: 85% lower using Spot instances
Automatic scaling: 0 instances when idle
Fault tolerance: Automatic retry on spot interruption

Example 3: Database Performance Tuning

Scenario: RDS PostgreSQL database experiencing slow queries. Average response time: 500ms, peak: 2 seconds.

Investigation:

## Enable Performance Insights and analyze
import boto3
import json

pi = boto3.client('pi')  # Performance Insights

## Get top SQL queries by load
response = pi.get_resource_metrics(
    ServiceType='RDS',
    Identifier='db-ABCDEFGHIJKLMNOPQRS',
    MetricQueries=[
        {
            'Metric': 'db.load.avg',
            'GroupBy': {
                'Group': 'db.sql'
            }
        }
    ],
    StartTime='2024-01-15T00:00:00Z',
    EndTime='2024-01-15T23:59:59Z',
    PeriodInSeconds=3600
)

## Identify slow queries
for metric in response['MetricList']:
    print(f"Query: {metric['Key']['Dimensions']['db.sql.statement'][:100]}...")
    print(f"Average Load: {sum(metric['DataPoints']) / len(metric['DataPoints']):.2f}")
    print("---")

Optimization Steps:

-- Step 1: Add missing indexes (identified from Performance Insights)
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date DESC);

CREATE INDEX idx_products_category 
ON products(category_id) 
WHERE active = true;

-- Step 2: Optimize slow query
-- Before (2 seconds):
SELECT o.*, c.name, p.title 
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date > NOW() - INTERVAL '30 days';

-- After (50ms): Use CTEs and limit joins
WITH recent_orders AS (
  SELECT * FROM orders 
  WHERE order_date > NOW() - INTERVAL '30 days'
  AND customer_id IN (SELECT id FROM customers WHERE active = true)
)
SELECT ro.*, c.name
FROM recent_orders ro
JOIN customers c ON ro.customer_id = c.id;

## Step 3: Implement read replica for reporting queries
import boto3

rds = boto3.client('rds')

## Create read replica
replica = rds.create_db_instance_read_replica(
    DBInstanceIdentifier='mydb-read-replica',
    SourceDBInstanceIdentifier='mydb-primary',
    DBInstanceClass='db.r5.xlarge',  # Memory-optimized for caching
    PubliclyAccessible=False,
    Tags=[
        {'Key': 'Purpose', 'Value': 'reporting'},
        {'Key': 'Environment', 'Value': 'production'}
    ]
)

## Update application to route read queries to replica
## In application config:
## DB_WRITE_ENDPOINT = 'mydb-primary.abcdef.us-east-1.rds.amazonaws.com'
## DB_READ_ENDPOINT = 'mydb-read-replica.abcdef.us-east-1.rds.amazonaws.com'

Results:

Average response time: 500ms → 80ms (6.25x faster)
Peak response time: 2s → 200ms (10x faster)
Primary database CPU: 75% → 40%
Zero application changes needed for read routing

Example 4: Serverless Architecture Cost Comparison

Scenario: API backend serving 10 million requests/month.

Option A: EC2-based (Current)

## Always-on instances
## 3x m5.large instances behind ALB
## Cost: $219/month (instances) + $23/month (ALB) = $242/month

Option B: Serverless (Proposed)

## AWS Lambda + API Gateway
import json

def lambda_handler(event, context):
    # Process request
    user_id = event['pathParameters']['userId']
    
    # Query DynamoDB
    import boto3
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('Users')
    
    response = table.get_item(Key={'userId': user_id})
    
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(response.get('Item', {}))
    }

## Cost calculation:
## Lambda: 10M requests × $0.20/1M = $2.00
## Lambda compute: 10M × 200ms × $0.0000166667/GB-sec = $33.33 (1GB memory)
## API Gateway: 10M requests × $3.50/1M = $35.00
## DynamoDB: 10M reads × $0.25/1M (on-demand) = $2.50
## Total: $72.83/month (70% savings)

Cost Comparison:

Component	EC2 Option	Serverless Option
Compute	$219/month	$35.33/month
Load Balancer / API Gateway	$23/month	$35/month
Database	RDS: $115/month	DynamoDB: $2.50/month
Total Monthly	$357	$72.83
Annual Savings	$3,410

Additional Benefits:

Zero server management
Automatic scaling
Pay only for actual usage
Built-in high availability

⚠️ Consideration: Serverless works best for sporadic or variable workloads. For consistent high-volume traffic (>50M requests/month), EC2 with Reserved Instances may be more cost-effective.

Common Mistakes & How to Avoid Them ⚠️

Mistake 1: Not Using Tags for Cost Allocation

❌ Wrong:

## Launch instance without tags
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='t3.medium',
    MinCount=1,
    MaxCount=1
)

✅ Right:

## Launch with comprehensive tags
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='t3.medium',
    MinCount=1,
    MaxCount=1,
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                {'Key': 'Name', 'Value': 'web-server-01'},
                {'Key': 'Environment', 'Value': 'production'},
                {'Key': 'Project', 'Value': 'customer-portal'},
                {'Key': 'CostCenter', 'Value': 'engineering'},
                {'Key': 'Owner', 'Value': 'alice@example.com'},
                {'Key': 'AutoShutdown', 'Value': 'false'}
            ]
        }
    ]
)

Why it matters: Without tags, you can't track spending by team, project, or environment. Cost allocation becomes impossible.

Mistake 2: Leaving Resources Running 24/7

❌ Wrong: Development and test environments run continuously, even nights and weekends.

✅ Right:

## Lambda function to stop dev/test instances nights and weekends
import boto3
from datetime import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    
    # Get current time
    now = datetime.now()
    hour = now.hour
    weekday = now.weekday()  # 0=Monday, 6=Sunday
    
    # Define schedule: Stop 7PM-7AM weekdays, all day weekends
    should_stop = (
        hour < 7 or hour >= 19  # Outside 7AM-7PM
    ) or (
        weekday >= 5  # Weekend (Sat/Sun)
    )
    
    if should_stop:
        # Find instances tagged for auto-shutdown
        instances = ec2.describe_instances(
            Filters=[
                {'Name': 'tag:AutoShutdown', 'Values': ['true']},
                {'Name': 'instance-state-name', 'Values': ['running']}
            ]
        )
        
        instance_ids = []
        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                instance_ids.append(instance['InstanceId'])
        
        if instance_ids:
            ec2.stop_instances(InstanceIds=instance_ids)
            print(f"Stopped {len(instance_ids)} instances")
    
    return {'statusCode': 200}

## Schedule with EventBridge: cron(0 7,19 * * ? *)
## Runs at 7AM and 7PM daily

Savings: ~70% reduction for dev/test environments (running only 50 hours/week vs 168 hours/week).

Mistake 3: Not Monitoring Burst Credits on T-Instances

❌ Wrong: Using t3.medium for application that consistently needs high CPU, leading to credit exhaustion and throttling.

✅ Right:

## Create CloudWatch alarm for low CPU credits
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_alarm(
    AlarmName='low-cpu-credits-web-01',
    ComparisonOperator='LessThanThreshold',
    EvaluationPeriods=2,
    MetricName='CPUCreditBalance',
    Namespace='AWS/EC2',
    Period=300,
    Statistic='Average',
    Threshold=100.0,  # Alert when credits drop below 100
    ActionsEnabled=True,
    AlarmActions=['arn:aws:sns:us-east-1:123456789012:ops-alerts'],
    AlarmDescription='Alert when T-instance running low on CPU credits',
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
    ]
)

Better Solution: Switch to M-series for consistent CPU needs, or enable T3 Unlimited mode:

aws ec2 modify-instance-credit-specification \
  --instance-credit-specification "InstanceId=i-0123456789abcdef0,CpuCredits=unlimited"

Mistake 4: Using On-Demand for Predictable Workloads

❌ Wrong:

## Running database on on-demand pricing
## db.r5.xlarge on-demand: $0.252/hour = $183.96/month

✅ Right:

## Purchase Reserved Instance for database
## db.r5.xlarge 1-year partial upfront: $0.155/hour = $113.15/month
## Savings: $70.81/month ($850/year)

rds = boto3.client('rds')

## Purchase RDS Reserved Instance
response = rds.purchase_reserved_db_instances_offering(
    ReservedDBInstancesOfferingId='offering-12345678',
    ReservedDBInstanceId='my-reserved-db',
    DBInstanceCount=1
)

Mistake 5: Not Using S3 Lifecycle Policies

❌ Wrong: Storing all logs in S3 Standard forever.

✅ Right:

{
  "Rules": [
    {
      "Id": "intelligent-log-management",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER_IR"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 2555},
      "NoncurrentVersionExpiration": {"NoncurrentDays": 30}
    }
  ]
}

Savings Example:

1TB logs/month, retained for 7 years
Standard: $0.023/GB = $23/month × 84 months = $1,932
With lifecycle: $23 (month 1) + $12.80 (months 2-3) + $4 (months 4-12) + $1 (years 2-7) = $340 total (82% savings)

Mistake 6: Ignoring Network Transfer Costs

❌ Wrong: Transferring data between regions unnecessarily.

## Application in us-east-1 reading from S3 in eu-west-1
## 10TB/month × $0.02/GB = $204/month in data transfer fees

✅ Right:

## Use S3 Cross-Region Replication to keep data local
s3 = boto3.client('s3')

s3.put_bucket_replication(
    Bucket='my-source-bucket',
    ReplicationConfiguration={
        'Role': 'arn:aws:iam::123456789012:role/s3-replication-role',
        'Rules': [
            {
                'ID': 'replicate-to-us-east-1',
                'Status': 'Enabled',
                'Priority': 1,
                'Filter': {'Prefix': ''},
                'Destination': {
                    'Bucket': 'arn:aws:s3:::my-replica-bucket-us-east-1',
                    'StorageClass': 'STANDARD_IA'  # Use cheaper storage class
                }
            }
        ]
    }
)

## Application reads from local replica
## One-time replication cost: 10TB × $0.02/GB = $204
## Ongoing monthly transfer: $0 (local reads)

Key Takeaways 🎯

📋 Quick Reference Card

Cost Optimization Priorities:

Right-size first - 30-50% savings typically available
Use Reserved Instances/Savings Plans - For predictable workloads (up to 72% off)
Implement Auto Scaling - Pay only for what you need
Leverage Spot Instances - For fault-tolerant workloads (up to 90% off)
Apply S3 Lifecycle Policies - Automatically tier storage (up to 95% off)
Tag everything - Enable cost allocation and tracking
Set up billing alerts - Catch anomalies early

Performance Optimization Checklist:

✅ Use appropriate instance family (C/R/M/I/P)
✅ Enable Enhanced Networking for latency-sensitive apps
✅ Implement caching layers (ElastiCache, DAX, CloudFront)
✅ Use read replicas for read-heavy databases
✅ Deploy resources in multiple AZs for high availability
✅ Enable CloudWatch detailed monitoring
✅ Use Placement Groups for tightly coupled workloads
✅ Optimize database queries and add indexes

Cost & Performance Tools:

Tool	Purpose	Key Metric
Cost Explorer	Spending analysis	Monthly cost trends
AWS Budgets	Cost alerts	Budget vs actual
Compute Optimizer	Right-sizing	Utilization %
Trusted Advisor	Best practices	Checks passed/failed
CloudWatch	Performance monitoring	Resource metrics
Performance Insights	Database tuning	Query load

Golden Rules:

Measure before optimizing - Get baseline metrics
One change at a time - Isolate impact
Automate everything - Reduce human error
Review monthly - Costs and performance drift
Test in non-prod first - Validate changes safely

🧠 Memory Device - CRAP²: Cache aggressively, Right-size resources, Automate scaling, Plan for reserved capacity, Performance test continuously.

💡 Final Tip: The AWS Well-Architected Tool provides free assessments. Use it quarterly to identify cost and performance optimization opportunities.

📚 Further Study

AWS Well-Architected Framework - Cost Optimization Pillar
https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html
AWS Cost Optimization Best Practices
https://aws.amazon.com/pricing/cost-optimization/
Amazon CloudWatch User Guide - Performance Metrics
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html

Congratulations! 🎉 You now understand the core principles of AWS cost and performance engineering. Practice implementing these techniques in your own AWS environment, starting with the low-hanging fruit: tagging resources, setting up billing alerts, and analyzing your Cost Explorer data. Remember: optimization is an ongoing process, not a one-time project.

📝

Ready to practice?

This lesson has 15 questions to help you learn