You are viewing a preview of this lesson. Sign in to start learning
Back to Mastering AWS

Cost & Performance Engineering

FinOps, cost allocation, tagging, rightsizing, and designing for failure

AWS Cost & Performance Engineering

Master AWS cost optimization and performance engineering with free flashcards and spaced repetition practice. This lesson covers cost monitoring strategies, right-sizing resources, performance tuning techniques, and architectural optimization patternsβ€”essential concepts for AWS Solutions Architect and SysOps Administrator certifications.

Welcome to Cost & Performance Engineering πŸ’°βš‘

In cloud computing, cost optimization and performance engineering are two sides of the same coin. Organizations often overprovision resources "just to be safe," resulting in wasted spend, or undersize resources to save money, creating performance bottlenecks. The art of AWS cost and performance engineering lies in finding the sweet spot where your applications run efficiently at the lowest sustainable cost.

Why This Matters:

  • Companies waste an average of 35% of their cloud spending on unused or oversized resources
  • Poor performance impacts user experience and business revenue
  • Well-architected systems achieve both cost efficiency and high performance
  • AWS offers over 200 servicesβ€”choosing the right combination saves money and boosts speed

πŸ’‘ Pro Tip: The AWS Well-Architected Framework includes both Cost Optimization and Performance Efficiency as two of its five pillars. Mastering both makes you invaluable to any organization.


Core Concepts: Understanding AWS Cost Structure πŸ’΅

1. AWS Pricing Models

AWS offers several pricing models, each suited to different workload patterns:

Pricing Model Best For Savings vs On-Demand Commitment
On-Demand Variable workloads, testing Baseline (0%) None
Reserved Instances Steady-state workloads Up to 72% 1-3 years
Savings Plans Flexible compute usage Up to 66% 1-3 years
Spot Instances Fault-tolerant, flexible workloads Up to 90% None (can be interrupted)

Reserved Instances (RIs) provide the deepest discounts but require upfront commitment to specific instance families and regions. Think of them like buying a gym membershipβ€”you pay upfront for guaranteed access.

Savings Plans offer similar discounts but with more flexibility. You commit to a dollar amount per hour (e.g., $10/hour) rather than specific instance types. The plan automatically applies to any matching compute usage.

Spot Instances leverage AWS's unused capacity. AWS can reclaim them with 2-minute notice, making them perfect for batch processing, CI/CD, data analysis, and containerized workloads with checkpointing.

## Example: Launching a Spot Instance with boto3
import boto3

ec2 = boto3.client('ec2')

response = ec2.request_spot_instances(
    SpotPrice='0.05',
    InstanceCount=1,
    Type='one-time',
    LaunchSpecification={
        'ImageId': 'ami-0abcdef1234567890',
        'InstanceType': 't3.medium',
        'KeyName': 'my-key-pair',
        'SecurityGroupIds': ['sg-0123456789abcdef0'],
        'SubnetId': 'subnet-12345678',
        'IamInstanceProfile': {
            'Name': 'MyInstanceProfile'
        }
    }
)

print(f"Spot request ID: {response['SpotInstanceRequests'][0]['SpotInstanceRequestId']}")

🧠 Memory Device - ROSS: Reserved for steady loads, On-demand for flexibility, Savings Plans for compute flexibility, Spot for interruptible workloads.

2. Cost Monitoring & Visibility Tools πŸ“Š

AWS Cost Explorer provides visual analysis of spending patterns over time. It shows:

  • Historical cost data (up to 12 months)
  • Forecasting for next 12 months
  • Cost breakdown by service, linked account, tag, or region
  • Reserved Instance recommendations

AWS Budgets creates custom alerts when costs or usage exceed thresholds:

{
  "Budget": {
    "BudgetName": "Monthly-EC2-Budget",
    "BudgetLimit": {
      "Amount": "1000",
      "Unit": "USD"
    },
    "TimeUnit": "MONTHLY",
    "BudgetType": "COST",
    "CostFilters": {
      "Service": ["Amazon Elastic Compute Cloud - Compute"]
    }
  },
  "NotificationsWithSubscribers": [
    {
      "Notification": {
        "NotificationType": "ACTUAL",
        "ComparisonOperator": "GREATER_THAN",
        "Threshold": 80,
        "ThresholdType": "PERCENTAGE"
      },
      "Subscribers": [
        {
          "SubscriptionType": "EMAIL",
          "Address": "ops-team@example.com"
        }
      ]
    }
  ]
}

AWS Cost and Usage Report (CUR) delivers the most granular dataβ€”hourly resource usage with custom tags. It writes to S3 for analysis with Athena, QuickSight, or third-party tools.

AWS Trusted Advisor provides real-time recommendations across five categories, including cost optimization:

  • Low utilization EC2 instances
  • Idle RDS databases
  • Unassociated Elastic IPs
  • Underutilized EBS volumes
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      COST MONITORING WORKFLOW                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

    πŸ“Š Cost Explorer
         β”‚
         ↓
    🎯 Identify high-spend services
         β”‚
         ↓
    πŸ” Drill down by tag/resource
         β”‚
         ↓
    πŸ’‘ Trusted Advisor checks
         β”‚
         ↓
    πŸ“‹ Create optimization plan
         β”‚
         ↓
    πŸ”” Set Budget alerts
         β”‚
         ↓
    πŸ”„ Monitor CUR in Athena
         β”‚
         ↓
    βœ… Implement changes
         β”‚
         ↓
    πŸ“ˆ Measure impact

πŸ’‘ Pro Tip: Tag everything! Use tags like Environment, Project, Owner, and CostCenter to enable detailed cost allocation. AWS supports up to 50 tags per resource.

3. Right-Sizing Resources πŸ“

Right-sizing means matching instance types and sizes to actual workload requirements. Most organizations overprovision by 30-50%.

CloudWatch Metrics reveal actual resource utilization:

## Check average CPU utilization over 14 days
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
    ],
    StartTime=datetime.utcnow() - timedelta(days=14),
    EndTime=datetime.utcnow(),
    Period=3600,  # 1 hour intervals
    Statistics=['Average', 'Maximum']
)

avg_cpu = sum([d['Average'] for d in response['Datapoints']]) / len(response['Datapoints'])
max_cpu = max([d['Maximum'] for d in response['Datapoints']])

print(f"Average CPU: {avg_cpu:.2f}%")
print(f"Maximum CPU: {max_cpu:.2f}%")

if avg_cpu < 10:
    print("⚠️ Consider downsizing or stopping this instance")
elif avg_cpu > 70:
    print("⚠️ Consider upsizing to prevent performance issues")

AWS Compute Optimizer uses machine learning to analyze historical utilization and recommend optimal instance types:

## Get recommendations via AWS CLI
aws compute-optimizer get-ec2-instance-recommendations \
  --instance-arns arn:aws:ec2:us-east-1:123456789012:instance/i-0123456789abcdef0

Output includes:

  • Current instance type and pricing
  • Recommended instance types (up to 3 options)
  • Projected savings
  • Performance risk assessment (Very Low, Low, Medium, High)
Metric Target Range Action if Below Action if Above
CPU 40-70% Downsize Upsize or scale out
Memory 50-80% Switch to compute-optimized Switch to memory-optimized
Network 30-60% Consider smaller instance Enable enhanced networking
Disk I/O 40-70% Use gp3 instead of io2 Increase IOPS or use io2
4. Storage Optimization Strategies πŸ’Ύ

S3 Storage Classes offer tiered pricing based on access patterns:

Storage Class Use Case Availability Cost (relative)
S3 Standard Frequently accessed data 99.99% $$$$
S3 Intelligent-Tiering Unknown/changing access 99.9% $$$$ (auto-optimized)
S3 Standard-IA Infrequent access 99.9% $$$
S3 One Zone-IA Infrequent, reproducible 99.5% $$
S3 Glacier Instant Archive, instant retrieval 99.9% $$
S3 Glacier Flexible Archive, 1-5 min retrieval 99.99% $
S3 Glacier Deep Archive Long-term archive, 12hr retrieval 99.99% Β’

S3 Lifecycle Policies automatically transition objects between classes:

{
  "Rules": [
    {
      "Id": "LogArchivalPolicy",
      "Status": "Enabled",
      "Filter": {
        "Prefix": "logs/"
      },
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER_IR"
        },
        {
          "Days": 365,
          "StorageClass": "DEEP_ARCHIVE"
        }
      ],
      "Expiration": {
        "Days": 2555
      }
    }
  ]
}

EBS Volume Optimization:

## Identify unattached EBS volumes (wasted cost)
import boto3

ec2 = boto3.client('ec2')

volumes = ec2.describe_volumes(
    Filters=[{'Name': 'status', 'Values': ['available']}]
)['Volumes']

total_cost = 0
for vol in volumes:
    size_gb = vol['Size']
    vol_type = vol['VolumeType']
    
    # Approximate monthly cost (gp3 = $0.08/GB)
    monthly_cost = size_gb * 0.08
    total_cost += monthly_cost
    
    print(f"Volume {vol['VolumeId']}: {size_gb}GB {vol_type} = ${monthly_cost:.2f}/month")

print(f"\nTotal wasted spend on unattached volumes: ${total_cost:.2f}/month")

gp3 vs gp2: gp3 volumes are 20% cheaper than gp2 and provide better performance. You can provision IOPS and throughput independently.

## Modify gp2 volume to gp3
aws ec2 modify-volume \
  --volume-id vol-0123456789abcdef0 \
  --volume-type gp3 \
  --iops 3000 \
  --throughput 125

πŸ’‘ Did You Know? S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns with no retrieval fees. It's perfect when you can't predict access patterns.


Performance Engineering Fundamentals ⚑

5. Compute Performance Optimization

Instance Family Selection:

Family Code Optimized For Example Use Cases
General Purpose T, M, A Balanced CPU/memory Web servers, small databases
Compute Optimized C High CPU performance Batch processing, gaming, HPC
Memory Optimized R, X, Z Large in-memory datasets Caching, in-memory databases, big data
Storage Optimized I, D, H High disk I/O NoSQL databases, data warehouses
Accelerated Computing P, G, F, Inf GPU/FPGA workloads ML training, graphics rendering

Burstable Instances (T-series) accumulate CPU credits when idle and consume them during bursts:

## Monitor T-instance CPU credit balance
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUCreditBalance',
    Dimensions=[{'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}],
    StartTime=datetime.utcnow() - timedelta(hours=24),
    EndTime=datetime.utcnow(),
    Period=3600,
    Statistics=['Average']
)

current_credits = response['Datapoints'][-1]['Average'] if response['Datapoints'] else 0
print(f"Current CPU Credit Balance: {current_credits:.0f}")

if current_credits < 50:
    print("⚠️ Low credit balance! Consider switching to unlimited mode or M-series")

Enhanced Networking provides higher bandwidth, higher packet-per-second performance, and lower latency:

## Enable ENA (Elastic Network Adapter) on an instance
aws ec2 modify-instance-attribute \
  --instance-id i-0123456789abcdef0 \
  --ena-support

Placement Groups control instance placement for performance:

  • Cluster: Low latency, high throughput (same AZ)
  • Partition: Isolated failure domains (different hardware)
  • Spread: Each instance on separate hardware (max 7 per AZ)
## Create a cluster placement group
import boto3

ec2 = boto3.client('ec2')

ec2.create_placement_group(
    GroupName='my-hpc-cluster',
    Strategy='cluster'
)

## Launch instances into the placement group
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='c5n.18xlarge',
    MinCount=3,
    MaxCount=3,
    Placement={
        'GroupName': 'my-hpc-cluster'
    }
)
6. Database Performance Tuning πŸ—„οΈ

RDS Performance Insights identifies performance bottlenecks:

## Enable Performance Insights on RDS instance
import boto3

rds = boto3.client('rds')

rds.modify_db_instance(
    DBInstanceIdentifier='my-database',
    EnablePerformanceInsights=True,
    PerformanceInsightsRetentionPeriod=7  # days
)

Performance Insights shows:

  • Top SQL queries by load
  • Wait events (I/O, CPU, locks)
  • Database load over time

Read Replicas offload read traffic from primary:

## Create read replica
rds.create_db_instance_read_replica(
    DBInstanceIdentifier='my-db-replica',
    SourceDBInstanceIdentifier='my-database',
    DBInstanceClass='db.r5.large',
    AvailabilityZone='us-east-1b',
    PubliclyAccessible=False
)

Aurora Serverless v2 auto-scales compute capacity:

## Create Aurora Serverless v2 cluster
rds.create_db_cluster(
    DBClusterIdentifier='my-serverless-cluster',
    Engine='aurora-mysql',
    EngineVersion='8.0.mysql_aurora.3.02.0',
    ServerlessV2ScalingConfiguration={
        'MinCapacity': 0.5,  # ACUs (Aurora Capacity Units)
        'MaxCapacity': 2.0
    },
    MasterUsername='admin',
    MasterUserPassword='SecurePassword123!',
    DatabaseName='myapp'
)

DynamoDB Performance:

## Enable auto-scaling for DynamoDB table
import boto3

application_autoscaling = boto3.client('application-autoscaling')

## Register table as scalable target
application_autoscaling.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/my-table',
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    MinCapacity=5,
    MaxCapacity=100
)

## Create scaling policy
application_autoscaling.put_scaling_policy(
    PolicyName='my-table-read-scaling',
    ServiceNamespace='dynamodb',
    ResourceId='table/my-table',
    ScalableDimension='dynamodb:table:ReadCapacityUnits',
    PolicyType='TargetTrackingScaling',
    TargetTrackingScalingPolicyConfiguration={
        'TargetValue': 70.0,  # Target 70% utilization
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
        },
        'ScaleOutCooldown': 60,
        'ScaleInCooldown': 60
    }
)

DynamoDB Accelerator (DAX) provides in-memory caching:

## Create DAX cluster
import boto3

dax = boto3.client('dax')

dax.create_cluster(
    ClusterName='my-dax-cluster',
    NodeType='dax.r5.large',
    ReplicationFactor=3,
    IamRoleArn='arn:aws:iam::123456789012:role/DAXServiceRole',
    SubnetGroupName='my-subnet-group'
)

DAX reduces read latency from milliseconds to microseconds for repeated queries.

7. Network Performance Optimization 🌐

CloudFront CDN caches content at edge locations:

## Create CloudFront distribution
import boto3

cloudfront = boto3.client('cloudfront')

distribution = cloudfront.create_distribution(
    DistributionConfig={
        'CallerReference': 'my-distribution-2024',
        'Origins': {
            'Quantity': 1,
            'Items': [
                {
                    'Id': 's3-origin',
                    'DomainName': 'my-bucket.s3.amazonaws.com',
                    'S3OriginConfig': {
                        'OriginAccessIdentity': ''
                    }
                }
            ]
        },
        'DefaultCacheBehavior': {
            'TargetOriginId': 's3-origin',
            'ViewerProtocolPolicy': 'redirect-to-https',
            'AllowedMethods': {
                'Quantity': 2,
                'Items': ['GET', 'HEAD']
            },
            'Compress': True,
            'MinTTL': 0,
            'DefaultTTL': 86400,  # 24 hours
            'MaxTTL': 31536000,   # 1 year
            'ForwardedValues': {
                'QueryString': False,
                'Cookies': {'Forward': 'none'}
            }
        },
        'Enabled': True
    }
)

Global Accelerator uses AWS global network:

## Create Global Accelerator
globalaccelerator = boto3.client('globalaccelerator')

accelerator = globalaccelerator.create_accelerator(
    Name='my-accelerator',
    IpAddressType='IPV4',
    Enabled=True
)

## Add listener
listener = globalaccelerator.create_listener(
    AcceleratorArn=accelerator['Accelerator']['AcceleratorArn'],
    PortRanges=[{'FromPort': 80, 'ToPort': 80}],
    Protocol='TCP'
)

Global Accelerator improves performance by:

  • Routing traffic over AWS backbone (not public internet)
  • Providing static anycast IPs
  • Automatic failover to healthy endpoints
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   CONTENT DELIVERY DECISION TREE               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

         Need caching?
              β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”
      β”‚                β”‚
     YES               NO
      β”‚                β”‚
      ↓                ↓
  Static content?   Real-time?
      β”‚                β”‚
  β”Œβ”€β”€β”€β”΄β”€β”€β”€β”        β”Œβ”€β”€β”€β”΄β”€β”€β”€β”
 YES     NO       YES     NO
  β”‚      β”‚         β”‚       β”‚
  ↓      ↓         ↓       ↓
CloudFront  ┐   Global    Direct
            β”‚   Accelerator  to
         API β”˜              origin
        Gateway
       (caching)

Practical Examples with Real-World Scenarios πŸ”§

Example 1: E-commerce Platform Cost Optimization

Scenario: An e-commerce company runs 50 m5.large EC2 instances 24/7 for their web tier. Monthly cost is $3,650.

Analysis:

  • Traffic peaks during business hours (9am-9pm)
  • Average CPU utilization: 25%
  • Peak CPU: 60%

Optimization Strategy:

## Step 1: Analyze current usage
import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

## Get average CPU across all instances
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    Dimensions=[{'Name': 'AutoScalingGroupName', 'Value': 'web-tier-asg'}],
    StartTime=datetime.utcnow() - timedelta(days=30),
    EndTime=datetime.utcnow(),
    Period=3600,
    Statistics=['Average', 'Maximum']
)

## Step 2: Implement changes
ec2 = boto3.client('ec2')
autoscaling = boto3.client('autoscaling')

## Purchase Reserved Instances for baseline (20 instances)
ec2.purchase_reserved_instances_offering(
    InstanceCount=20,
    ReservedInstancesOfferingId='offering-12345678',  # m5.large, 1-year, partial upfront
)

## Configure Auto Scaling for variable load (10-40 instances)
autoscaling.put_scaling_policy(
    AutoScalingGroupName='web-tier-asg',
    PolicyName='target-tracking-scaling',
    PolicyType='TargetTrackingScaling',
    TargetTrackingConfiguration={
        'PredefinedMetricSpecification': {
            'PredefinedMetricType': 'ASGAverageCPUUtilization'
        },
        'TargetValue': 50.0
    }
)

## Step 3: Add Spot Instances for additional capacity
autoscaling.create_auto_scaling_group(
    AutoScalingGroupName='web-tier-spot-asg',
    MixedInstancesPolicy={
        'InstancesDistribution': {
            'OnDemandBaseCapacity': 0,
            'OnDemandPercentageAboveBaseCapacity': 20,  # 20% on-demand, 80% spot
            'SpotAllocationStrategy': 'capacity-optimized'
        },
        'LaunchTemplate': {
            'LaunchTemplateSpecification': {
                'LaunchTemplateId': 'lt-0123456789abcdef0',
                'Version': '$Latest'
            },
            'Overrides': [
                {'InstanceType': 'm5.large'},
                {'InstanceType': 'm5a.large'},  # Alternative for better spot availability
                {'InstanceType': 'm5n.large'}
            ]
        }
    },
    MinSize=10,
    MaxSize=40,
    DesiredCapacity=20,
    VPCZoneIdentifier='subnet-1,subnet-2,subnet-3'
)

Results:

  • 20 Reserved Instances (baseline): $1,460/month (60% savings)
  • 10-30 Auto Scaling (mix of on-demand/spot): ~$800/month average
  • Total cost: ~$2,260/month (38% reduction)
  • Annual savings: $16,680
Example 2: Media Processing Pipeline Performance

Scenario: Video transcoding pipeline processes 1,000 videos/day. Current processing time: 4 hours average per batch.

Bottleneck Analysis:

  • Using c5.2xlarge instances (8 vCPU)
  • Single-threaded processing
  • No parallelization

Optimization:

## Implement parallel processing with AWS Batch
import boto3
import json

batch = boto3.client('batch')
s3 = boto3.client('s3')

## Define compute environment with Spot instances
compute_env = batch.create_compute_environment(
    computeEnvironmentName='video-transcoding-spot',
    type='MANAGED',
    state='ENABLED',
    computeResources={
        'type': 'SPOT',
        'allocationStrategy': 'SPOT_CAPACITY_OPTIMIZED',
        'minvCpus': 0,
        'maxvCpus': 256,
        'desiredvCpus': 0,
        'instanceTypes': ['c5', 'c5n', 'c5a'],  # Multiple types for better availability
        'subnets': ['subnet-1', 'subnet-2', 'subnet-3'],
        'securityGroupIds': ['sg-0123456789abcdef0'],
        'instanceRole': 'arn:aws:iam::123456789012:instance-profile/ecsInstanceRole',
        'bidPercentage': 70,  # Pay up to 70% of on-demand price
        'spotIamFleetRole': 'arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-role'
    }
)

## Create job queue
job_queue = batch.create_job_queue(
    jobQueueName='video-transcoding-queue',
    state='ENABLED',
    priority=100,
    computeEnvironmentOrder=[
        {
            'order': 1,
            'computeEnvironment': 'video-transcoding-spot'
        }
    ]
)

## Define job that processes single video
job_definition = batch.register_job_definition(
    jobDefinitionName='transcode-video',
    type='container',
    containerProperties={
        'image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/video-transcoder:latest',
        'vcpus': 4,
        'memory': 8192,
        'command': ['python', 'transcode.py', 'Ref::input_file', 'Ref::output_file'],
        'jobRoleArn': 'arn:aws:iam::123456789012:role/BatchJobRole'
    },
    retryStrategy={
        'attempts': 3,
        'evaluateOnExit': [
            {
                'onStatusReason': 'Host EC2*',  # Retry on spot interruption
                'action': 'RETRY'
            }
        ]
    }
)

## Submit jobs in parallel
videos = s3.list_objects_v2(Bucket='input-videos', Prefix='pending/')['Contents']

for video in videos:
    batch.submit_job(
        jobName=f"transcode-{video['Key'].split('/')[-1]}",
        jobQueue='video-transcoding-queue',
        jobDefinition='transcode-video',
        containerOverrides={
            'command': [
                'python', 'transcode.py',
                f"s3://input-videos/{video['Key']}",
                f"s3://output-videos/{video['Key']}"
            ]
        }
    )

print(f"Submitted {len(videos)} transcoding jobs")

Results:

  • Processing time: 4 hours β†’ 30 minutes (8x faster)
  • Cost: 85% lower using Spot instances
  • Automatic scaling: 0 instances when idle
  • Fault tolerance: Automatic retry on spot interruption
Example 3: Database Performance Tuning

Scenario: RDS PostgreSQL database experiencing slow queries. Average response time: 500ms, peak: 2 seconds.

Investigation:

## Enable Performance Insights and analyze
import boto3
import json

pi = boto3.client('pi')  # Performance Insights

## Get top SQL queries by load
response = pi.get_resource_metrics(
    ServiceType='RDS',
    Identifier='db-ABCDEFGHIJKLMNOPQRS',
    MetricQueries=[
        {
            'Metric': 'db.load.avg',
            'GroupBy': {
                'Group': 'db.sql'
            }
        }
    ],
    StartTime='2024-01-15T00:00:00Z',
    EndTime='2024-01-15T23:59:59Z',
    PeriodInSeconds=3600
)

## Identify slow queries
for metric in response['MetricList']:
    print(f"Query: {metric['Key']['Dimensions']['db.sql.statement'][:100]}...")
    print(f"Average Load: {sum(metric['DataPoints']) / len(metric['DataPoints']):.2f}")
    print("---")

Optimization Steps:

-- Step 1: Add missing indexes (identified from Performance Insights)
CREATE INDEX idx_orders_customer_date 
ON orders(customer_id, order_date DESC);

CREATE INDEX idx_products_category 
ON products(category_id) 
WHERE active = true;

-- Step 2: Optimize slow query
-- Before (2 seconds):
SELECT o.*, c.name, p.title 
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date > NOW() - INTERVAL '30 days';

-- After (50ms): Use CTEs and limit joins
WITH recent_orders AS (
  SELECT * FROM orders 
  WHERE order_date > NOW() - INTERVAL '30 days'
  AND customer_id IN (SELECT id FROM customers WHERE active = true)
)
SELECT ro.*, c.name
FROM recent_orders ro
JOIN customers c ON ro.customer_id = c.id;
## Step 3: Implement read replica for reporting queries
import boto3

rds = boto3.client('rds')

## Create read replica
replica = rds.create_db_instance_read_replica(
    DBInstanceIdentifier='mydb-read-replica',
    SourceDBInstanceIdentifier='mydb-primary',
    DBInstanceClass='db.r5.xlarge',  # Memory-optimized for caching
    PubliclyAccessible=False,
    Tags=[
        {'Key': 'Purpose', 'Value': 'reporting'},
        {'Key': 'Environment', 'Value': 'production'}
    ]
)

## Update application to route read queries to replica
## In application config:
## DB_WRITE_ENDPOINT = 'mydb-primary.abcdef.us-east-1.rds.amazonaws.com'
## DB_READ_ENDPOINT = 'mydb-read-replica.abcdef.us-east-1.rds.amazonaws.com'

Results:

  • Average response time: 500ms β†’ 80ms (6.25x faster)
  • Peak response time: 2s β†’ 200ms (10x faster)
  • Primary database CPU: 75% β†’ 40%
  • Zero application changes needed for read routing
Example 4: Serverless Architecture Cost Comparison

Scenario: API backend serving 10 million requests/month.

Option A: EC2-based (Current)

## Always-on instances
## 3x m5.large instances behind ALB
## Cost: $219/month (instances) + $23/month (ALB) = $242/month

Option B: Serverless (Proposed)

## AWS Lambda + API Gateway
import json

def lambda_handler(event, context):
    # Process request
    user_id = event['pathParameters']['userId']
    
    # Query DynamoDB
    import boto3
    dynamodb = boto3.resource('dynamodb')
    table = dynamodb.Table('Users')
    
    response = table.get_item(Key={'userId': user_id})
    
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(response.get('Item', {}))
    }

## Cost calculation:
## Lambda: 10M requests Γ— $0.20/1M = $2.00
## Lambda compute: 10M Γ— 200ms Γ— $0.0000166667/GB-sec = $33.33 (1GB memory)
## API Gateway: 10M requests Γ— $3.50/1M = $35.00
## DynamoDB: 10M reads Γ— $0.25/1M (on-demand) = $2.50
## Total: $72.83/month (70% savings)

Cost Comparison:

Component EC2 Option Serverless Option
Compute $219/month $35.33/month
Load Balancer / API Gateway $23/month $35/month
Database RDS: $115/month DynamoDB: $2.50/month
Total Monthly $357 $72.83
Annual Savings $3,410

Additional Benefits:

  • Zero server management
  • Automatic scaling
  • Pay only for actual usage
  • Built-in high availability

⚠️ Consideration: Serverless works best for sporadic or variable workloads. For consistent high-volume traffic (>50M requests/month), EC2 with Reserved Instances may be more cost-effective.


Common Mistakes & How to Avoid Them ⚠️

Mistake 1: Not Using Tags for Cost Allocation

❌ Wrong:

## Launch instance without tags
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='t3.medium',
    MinCount=1,
    MaxCount=1
)

βœ… Right:

## Launch with comprehensive tags
ec2.run_instances(
    ImageId='ami-0abcdef1234567890',
    InstanceType='t3.medium',
    MinCount=1,
    MaxCount=1,
    TagSpecifications=[
        {
            'ResourceType': 'instance',
            'Tags': [
                {'Key': 'Name', 'Value': 'web-server-01'},
                {'Key': 'Environment', 'Value': 'production'},
                {'Key': 'Project', 'Value': 'customer-portal'},
                {'Key': 'CostCenter', 'Value': 'engineering'},
                {'Key': 'Owner', 'Value': 'alice@example.com'},
                {'Key': 'AutoShutdown', 'Value': 'false'}
            ]
        }
    ]
)

Why it matters: Without tags, you can't track spending by team, project, or environment. Cost allocation becomes impossible.

Mistake 2: Leaving Resources Running 24/7

❌ Wrong: Development and test environments run continuously, even nights and weekends.

βœ… Right:

## Lambda function to stop dev/test instances nights and weekends
import boto3
from datetime import datetime

def lambda_handler(event, context):
    ec2 = boto3.client('ec2')
    
    # Get current time
    now = datetime.now()
    hour = now.hour
    weekday = now.weekday()  # 0=Monday, 6=Sunday
    
    # Define schedule: Stop 7PM-7AM weekdays, all day weekends
    should_stop = (
        hour < 7 or hour >= 19  # Outside 7AM-7PM
    ) or (
        weekday >= 5  # Weekend (Sat/Sun)
    )
    
    if should_stop:
        # Find instances tagged for auto-shutdown
        instances = ec2.describe_instances(
            Filters=[
                {'Name': 'tag:AutoShutdown', 'Values': ['true']},
                {'Name': 'instance-state-name', 'Values': ['running']}
            ]
        )
        
        instance_ids = []
        for reservation in instances['Reservations']:
            for instance in reservation['Instances']:
                instance_ids.append(instance['InstanceId'])
        
        if instance_ids:
            ec2.stop_instances(InstanceIds=instance_ids)
            print(f"Stopped {len(instance_ids)} instances")
    
    return {'statusCode': 200}

## Schedule with EventBridge: cron(0 7,19 * * ? *)
## Runs at 7AM and 7PM daily

Savings: ~70% reduction for dev/test environments (running only 50 hours/week vs 168 hours/week).

Mistake 3: Not Monitoring Burst Credits on T-Instances

❌ Wrong: Using t3.medium for application that consistently needs high CPU, leading to credit exhaustion and throttling.

βœ… Right:

## Create CloudWatch alarm for low CPU credits
import boto3

cloudwatch = boto3.client('cloudwatch')

cloudwatch.put_metric_alarm(
    AlarmName='low-cpu-credits-web-01',
    ComparisonOperator='LessThanThreshold',
    EvaluationPeriods=2,
    MetricName='CPUCreditBalance',
    Namespace='AWS/EC2',
    Period=300,
    Statistic='Average',
    Threshold=100.0,  # Alert when credits drop below 100
    ActionsEnabled=True,
    AlarmActions=['arn:aws:sns:us-east-1:123456789012:ops-alerts'],
    AlarmDescription='Alert when T-instance running low on CPU credits',
    Dimensions=[
        {'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
    ]
)

Better Solution: Switch to M-series for consistent CPU needs, or enable T3 Unlimited mode:

aws ec2 modify-instance-credit-specification \
  --instance-credit-specification "InstanceId=i-0123456789abcdef0,CpuCredits=unlimited"
Mistake 4: Using On-Demand for Predictable Workloads

❌ Wrong:

## Running database on on-demand pricing
## db.r5.xlarge on-demand: $0.252/hour = $183.96/month

βœ… Right:

## Purchase Reserved Instance for database
## db.r5.xlarge 1-year partial upfront: $0.155/hour = $113.15/month
## Savings: $70.81/month ($850/year)

rds = boto3.client('rds')

## Purchase RDS Reserved Instance
response = rds.purchase_reserved_db_instances_offering(
    ReservedDBInstancesOfferingId='offering-12345678',
    ReservedDBInstanceId='my-reserved-db',
    DBInstanceCount=1
)
Mistake 5: Not Using S3 Lifecycle Policies

❌ Wrong: Storing all logs in S3 Standard forever.

βœ… Right:

{
  "Rules": [
    {
      "Id": "intelligent-log-management",
      "Status": "Enabled",
      "Filter": {"Prefix": "logs/"},
      "Transitions": [
        {"Days": 30, "StorageClass": "STANDARD_IA"},
        {"Days": 90, "StorageClass": "GLACIER_IR"},
        {"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
      ],
      "Expiration": {"Days": 2555},
      "NoncurrentVersionExpiration": {"NoncurrentDays": 30}
    }
  ]
}

Savings Example:

  • 1TB logs/month, retained for 7 years
  • Standard: $0.023/GB = $23/month Γ— 84 months = $1,932
  • With lifecycle: $23 (month 1) + $12.80 (months 2-3) + $4 (months 4-12) + $1 (years 2-7) = $340 total (82% savings)
Mistake 6: Ignoring Network Transfer Costs

❌ Wrong: Transferring data between regions unnecessarily.

## Application in us-east-1 reading from S3 in eu-west-1
## 10TB/month Γ— $0.02/GB = $204/month in data transfer fees

βœ… Right:

## Use S3 Cross-Region Replication to keep data local
s3 = boto3.client('s3')

s3.put_bucket_replication(
    Bucket='my-source-bucket',
    ReplicationConfiguration={
        'Role': 'arn:aws:iam::123456789012:role/s3-replication-role',
        'Rules': [
            {
                'ID': 'replicate-to-us-east-1',
                'Status': 'Enabled',
                'Priority': 1,
                'Filter': {'Prefix': ''},
                'Destination': {
                    'Bucket': 'arn:aws:s3:::my-replica-bucket-us-east-1',
                    'StorageClass': 'STANDARD_IA'  # Use cheaper storage class
                }
            }
        ]
    }
)

## Application reads from local replica
## One-time replication cost: 10TB Γ— $0.02/GB = $204
## Ongoing monthly transfer: $0 (local reads)

Key Takeaways 🎯

πŸ“‹ Quick Reference Card

Cost Optimization Priorities:

  1. Right-size first - 30-50% savings typically available
  2. Use Reserved Instances/Savings Plans - For predictable workloads (up to 72% off)
  3. Implement Auto Scaling - Pay only for what you need
  4. Leverage Spot Instances - For fault-tolerant workloads (up to 90% off)
  5. Apply S3 Lifecycle Policies - Automatically tier storage (up to 95% off)
  6. Tag everything - Enable cost allocation and tracking
  7. Set up billing alerts - Catch anomalies early

Performance Optimization Checklist:

  • βœ… Use appropriate instance family (C/R/M/I/P)
  • βœ… Enable Enhanced Networking for latency-sensitive apps
  • βœ… Implement caching layers (ElastiCache, DAX, CloudFront)
  • βœ… Use read replicas for read-heavy databases
  • βœ… Deploy resources in multiple AZs for high availability
  • βœ… Enable CloudWatch detailed monitoring
  • βœ… Use Placement Groups for tightly coupled workloads
  • βœ… Optimize database queries and add indexes

Cost & Performance Tools:

ToolPurposeKey Metric
Cost ExplorerSpending analysisMonthly cost trends
AWS BudgetsCost alertsBudget vs actual
Compute OptimizerRight-sizingUtilization %
Trusted AdvisorBest practicesChecks passed/failed
CloudWatchPerformance monitoringResource metrics
Performance InsightsDatabase tuningQuery load

Golden Rules:

  1. Measure before optimizing - Get baseline metrics
  2. One change at a time - Isolate impact
  3. Automate everything - Reduce human error
  4. Review monthly - Costs and performance drift
  5. Test in non-prod first - Validate changes safely

🧠 Memory Device - CRAP²: Cache aggressively, Right-size resources, Automate scaling, Plan for reserved capacity, Performance test continuously.

πŸ’‘ Final Tip: The AWS Well-Architected Tool provides free assessments. Use it quarterly to identify cost and performance optimization opportunities.


πŸ“š Further Study

  1. AWS Well-Architected Framework - Cost Optimization Pillar
    https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html

  2. AWS Cost Optimization Best Practices
    https://aws.amazon.com/pricing/cost-optimization/

  3. Amazon CloudWatch User Guide - Performance Metrics
    https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html


Congratulations! πŸŽ‰ You now understand the core principles of AWS cost and performance engineering. Practice implementing these techniques in your own AWS environment, starting with the low-hanging fruit: tagging resources, setting up billing alerts, and analyzing your Cost Explorer data. Remember: optimization is an ongoing process, not a one-time project.