Cost & Performance Engineering
FinOps, cost allocation, tagging, rightsizing, and designing for failure
AWS Cost & Performance Engineering
Master AWS cost optimization and performance engineering with free flashcards and spaced repetition practice. This lesson covers cost monitoring strategies, right-sizing resources, performance tuning techniques, and architectural optimization patternsβessential concepts for AWS Solutions Architect and SysOps Administrator certifications.
Welcome to Cost & Performance Engineering π°β‘
In cloud computing, cost optimization and performance engineering are two sides of the same coin. Organizations often overprovision resources "just to be safe," resulting in wasted spend, or undersize resources to save money, creating performance bottlenecks. The art of AWS cost and performance engineering lies in finding the sweet spot where your applications run efficiently at the lowest sustainable cost.
Why This Matters:
- Companies waste an average of 35% of their cloud spending on unused or oversized resources
- Poor performance impacts user experience and business revenue
- Well-architected systems achieve both cost efficiency and high performance
- AWS offers over 200 servicesβchoosing the right combination saves money and boosts speed
π‘ Pro Tip: The AWS Well-Architected Framework includes both Cost Optimization and Performance Efficiency as two of its five pillars. Mastering both makes you invaluable to any organization.
Core Concepts: Understanding AWS Cost Structure π΅
1. AWS Pricing Models
AWS offers several pricing models, each suited to different workload patterns:
| Pricing Model | Best For | Savings vs On-Demand | Commitment |
|---|---|---|---|
| On-Demand | Variable workloads, testing | Baseline (0%) | None |
| Reserved Instances | Steady-state workloads | Up to 72% | 1-3 years |
| Savings Plans | Flexible compute usage | Up to 66% | 1-3 years |
| Spot Instances | Fault-tolerant, flexible workloads | Up to 90% | None (can be interrupted) |
Reserved Instances (RIs) provide the deepest discounts but require upfront commitment to specific instance families and regions. Think of them like buying a gym membershipβyou pay upfront for guaranteed access.
Savings Plans offer similar discounts but with more flexibility. You commit to a dollar amount per hour (e.g., $10/hour) rather than specific instance types. The plan automatically applies to any matching compute usage.
Spot Instances leverage AWS's unused capacity. AWS can reclaim them with 2-minute notice, making them perfect for batch processing, CI/CD, data analysis, and containerized workloads with checkpointing.
## Example: Launching a Spot Instance with boto3
import boto3
ec2 = boto3.client('ec2')
response = ec2.request_spot_instances(
SpotPrice='0.05',
InstanceCount=1,
Type='one-time',
LaunchSpecification={
'ImageId': 'ami-0abcdef1234567890',
'InstanceType': 't3.medium',
'KeyName': 'my-key-pair',
'SecurityGroupIds': ['sg-0123456789abcdef0'],
'SubnetId': 'subnet-12345678',
'IamInstanceProfile': {
'Name': 'MyInstanceProfile'
}
}
)
print(f"Spot request ID: {response['SpotInstanceRequests'][0]['SpotInstanceRequestId']}")
π§ Memory Device - ROSS: Reserved for steady loads, On-demand for flexibility, Savings Plans for compute flexibility, Spot for interruptible workloads.
2. Cost Monitoring & Visibility Tools π
AWS Cost Explorer provides visual analysis of spending patterns over time. It shows:
- Historical cost data (up to 12 months)
- Forecasting for next 12 months
- Cost breakdown by service, linked account, tag, or region
- Reserved Instance recommendations
AWS Budgets creates custom alerts when costs or usage exceed thresholds:
{
"Budget": {
"BudgetName": "Monthly-EC2-Budget",
"BudgetLimit": {
"Amount": "1000",
"Unit": "USD"
},
"TimeUnit": "MONTHLY",
"BudgetType": "COST",
"CostFilters": {
"Service": ["Amazon Elastic Compute Cloud - Compute"]
}
},
"NotificationsWithSubscribers": [
{
"Notification": {
"NotificationType": "ACTUAL",
"ComparisonOperator": "GREATER_THAN",
"Threshold": 80,
"ThresholdType": "PERCENTAGE"
},
"Subscribers": [
{
"SubscriptionType": "EMAIL",
"Address": "ops-team@example.com"
}
]
}
]
}
AWS Cost and Usage Report (CUR) delivers the most granular dataβhourly resource usage with custom tags. It writes to S3 for analysis with Athena, QuickSight, or third-party tools.
AWS Trusted Advisor provides real-time recommendations across five categories, including cost optimization:
- Low utilization EC2 instances
- Idle RDS databases
- Unassociated Elastic IPs
- Underutilized EBS volumes
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β COST MONITORING WORKFLOW β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
π Cost Explorer
β
β
π― Identify high-spend services
β
β
π Drill down by tag/resource
β
β
π‘ Trusted Advisor checks
β
β
π Create optimization plan
β
β
π Set Budget alerts
β
β
π Monitor CUR in Athena
β
β
β
Implement changes
β
β
π Measure impact
π‘ Pro Tip: Tag everything! Use tags like Environment, Project, Owner, and CostCenter to enable detailed cost allocation. AWS supports up to 50 tags per resource.
3. Right-Sizing Resources π
Right-sizing means matching instance types and sizes to actual workload requirements. Most organizations overprovision by 30-50%.
CloudWatch Metrics reveal actual resource utilization:
## Check average CPU utilization over 14 days
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[
{'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
],
StartTime=datetime.utcnow() - timedelta(days=14),
EndTime=datetime.utcnow(),
Period=3600, # 1 hour intervals
Statistics=['Average', 'Maximum']
)
avg_cpu = sum([d['Average'] for d in response['Datapoints']]) / len(response['Datapoints'])
max_cpu = max([d['Maximum'] for d in response['Datapoints']])
print(f"Average CPU: {avg_cpu:.2f}%")
print(f"Maximum CPU: {max_cpu:.2f}%")
if avg_cpu < 10:
print("β οΈ Consider downsizing or stopping this instance")
elif avg_cpu > 70:
print("β οΈ Consider upsizing to prevent performance issues")
AWS Compute Optimizer uses machine learning to analyze historical utilization and recommend optimal instance types:
## Get recommendations via AWS CLI
aws compute-optimizer get-ec2-instance-recommendations \
--instance-arns arn:aws:ec2:us-east-1:123456789012:instance/i-0123456789abcdef0
Output includes:
- Current instance type and pricing
- Recommended instance types (up to 3 options)
- Projected savings
- Performance risk assessment (Very Low, Low, Medium, High)
| Metric | Target Range | Action if Below | Action if Above |
|---|---|---|---|
| CPU | 40-70% | Downsize | Upsize or scale out |
| Memory | 50-80% | Switch to compute-optimized | Switch to memory-optimized |
| Network | 30-60% | Consider smaller instance | Enable enhanced networking |
| Disk I/O | 40-70% | Use gp3 instead of io2 | Increase IOPS or use io2 |
4. Storage Optimization Strategies πΎ
S3 Storage Classes offer tiered pricing based on access patterns:
| Storage Class | Use Case | Availability | Cost (relative) |
|---|---|---|---|
| S3 Standard | Frequently accessed data | 99.99% | $$$$ |
| S3 Intelligent-Tiering | Unknown/changing access | 99.9% | $$$$ (auto-optimized) |
| S3 Standard-IA | Infrequent access | 99.9% | $$$ |
| S3 One Zone-IA | Infrequent, reproducible | 99.5% | $$ |
| S3 Glacier Instant | Archive, instant retrieval | 99.9% | $$ |
| S3 Glacier Flexible | Archive, 1-5 min retrieval | 99.99% | $ |
| S3 Glacier Deep Archive | Long-term archive, 12hr retrieval | 99.99% | Β’ |
S3 Lifecycle Policies automatically transition objects between classes:
{
"Rules": [
{
"Id": "LogArchivalPolicy",
"Status": "Enabled",
"Filter": {
"Prefix": "logs/"
},
"Transitions": [
{
"Days": 30,
"StorageClass": "STANDARD_IA"
},
{
"Days": 90,
"StorageClass": "GLACIER_IR"
},
{
"Days": 365,
"StorageClass": "DEEP_ARCHIVE"
}
],
"Expiration": {
"Days": 2555
}
}
]
}
EBS Volume Optimization:
## Identify unattached EBS volumes (wasted cost)
import boto3
ec2 = boto3.client('ec2')
volumes = ec2.describe_volumes(
Filters=[{'Name': 'status', 'Values': ['available']}]
)['Volumes']
total_cost = 0
for vol in volumes:
size_gb = vol['Size']
vol_type = vol['VolumeType']
# Approximate monthly cost (gp3 = $0.08/GB)
monthly_cost = size_gb * 0.08
total_cost += monthly_cost
print(f"Volume {vol['VolumeId']}: {size_gb}GB {vol_type} = ${monthly_cost:.2f}/month")
print(f"\nTotal wasted spend on unattached volumes: ${total_cost:.2f}/month")
gp3 vs gp2: gp3 volumes are 20% cheaper than gp2 and provide better performance. You can provision IOPS and throughput independently.
## Modify gp2 volume to gp3
aws ec2 modify-volume \
--volume-id vol-0123456789abcdef0 \
--volume-type gp3 \
--iops 3000 \
--throughput 125
π‘ Did You Know? S3 Intelligent-Tiering automatically moves objects between access tiers based on usage patterns with no retrieval fees. It's perfect when you can't predict access patterns.
Performance Engineering Fundamentals β‘
5. Compute Performance Optimization
Instance Family Selection:
| Family | Code | Optimized For | Example Use Cases |
|---|---|---|---|
| General Purpose | T, M, A | Balanced CPU/memory | Web servers, small databases |
| Compute Optimized | C | High CPU performance | Batch processing, gaming, HPC |
| Memory Optimized | R, X, Z | Large in-memory datasets | Caching, in-memory databases, big data |
| Storage Optimized | I, D, H | High disk I/O | NoSQL databases, data warehouses |
| Accelerated Computing | P, G, F, Inf | GPU/FPGA workloads | ML training, graphics rendering |
Burstable Instances (T-series) accumulate CPU credits when idle and consume them during bursts:
## Monitor T-instance CPU credit balance
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUCreditBalance',
Dimensions=[{'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}],
StartTime=datetime.utcnow() - timedelta(hours=24),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Average']
)
current_credits = response['Datapoints'][-1]['Average'] if response['Datapoints'] else 0
print(f"Current CPU Credit Balance: {current_credits:.0f}")
if current_credits < 50:
print("β οΈ Low credit balance! Consider switching to unlimited mode or M-series")
Enhanced Networking provides higher bandwidth, higher packet-per-second performance, and lower latency:
## Enable ENA (Elastic Network Adapter) on an instance
aws ec2 modify-instance-attribute \
--instance-id i-0123456789abcdef0 \
--ena-support
Placement Groups control instance placement for performance:
- Cluster: Low latency, high throughput (same AZ)
- Partition: Isolated failure domains (different hardware)
- Spread: Each instance on separate hardware (max 7 per AZ)
## Create a cluster placement group
import boto3
ec2 = boto3.client('ec2')
ec2.create_placement_group(
GroupName='my-hpc-cluster',
Strategy='cluster'
)
## Launch instances into the placement group
ec2.run_instances(
ImageId='ami-0abcdef1234567890',
InstanceType='c5n.18xlarge',
MinCount=3,
MaxCount=3,
Placement={
'GroupName': 'my-hpc-cluster'
}
)
6. Database Performance Tuning ποΈ
RDS Performance Insights identifies performance bottlenecks:
## Enable Performance Insights on RDS instance
import boto3
rds = boto3.client('rds')
rds.modify_db_instance(
DBInstanceIdentifier='my-database',
EnablePerformanceInsights=True,
PerformanceInsightsRetentionPeriod=7 # days
)
Performance Insights shows:
- Top SQL queries by load
- Wait events (I/O, CPU, locks)
- Database load over time
Read Replicas offload read traffic from primary:
## Create read replica
rds.create_db_instance_read_replica(
DBInstanceIdentifier='my-db-replica',
SourceDBInstanceIdentifier='my-database',
DBInstanceClass='db.r5.large',
AvailabilityZone='us-east-1b',
PubliclyAccessible=False
)
Aurora Serverless v2 auto-scales compute capacity:
## Create Aurora Serverless v2 cluster
rds.create_db_cluster(
DBClusterIdentifier='my-serverless-cluster',
Engine='aurora-mysql',
EngineVersion='8.0.mysql_aurora.3.02.0',
ServerlessV2ScalingConfiguration={
'MinCapacity': 0.5, # ACUs (Aurora Capacity Units)
'MaxCapacity': 2.0
},
MasterUsername='admin',
MasterUserPassword='SecurePassword123!',
DatabaseName='myapp'
)
DynamoDB Performance:
## Enable auto-scaling for DynamoDB table
import boto3
application_autoscaling = boto3.client('application-autoscaling')
## Register table as scalable target
application_autoscaling.register_scalable_target(
ServiceNamespace='dynamodb',
ResourceId='table/my-table',
ScalableDimension='dynamodb:table:ReadCapacityUnits',
MinCapacity=5,
MaxCapacity=100
)
## Create scaling policy
application_autoscaling.put_scaling_policy(
PolicyName='my-table-read-scaling',
ServiceNamespace='dynamodb',
ResourceId='table/my-table',
ScalableDimension='dynamodb:table:ReadCapacityUnits',
PolicyType='TargetTrackingScaling',
TargetTrackingScalingPolicyConfiguration={
'TargetValue': 70.0, # Target 70% utilization
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'DynamoDBReadCapacityUtilization'
},
'ScaleOutCooldown': 60,
'ScaleInCooldown': 60
}
)
DynamoDB Accelerator (DAX) provides in-memory caching:
## Create DAX cluster
import boto3
dax = boto3.client('dax')
dax.create_cluster(
ClusterName='my-dax-cluster',
NodeType='dax.r5.large',
ReplicationFactor=3,
IamRoleArn='arn:aws:iam::123456789012:role/DAXServiceRole',
SubnetGroupName='my-subnet-group'
)
DAX reduces read latency from milliseconds to microseconds for repeated queries.
7. Network Performance Optimization π
CloudFront CDN caches content at edge locations:
## Create CloudFront distribution
import boto3
cloudfront = boto3.client('cloudfront')
distribution = cloudfront.create_distribution(
DistributionConfig={
'CallerReference': 'my-distribution-2024',
'Origins': {
'Quantity': 1,
'Items': [
{
'Id': 's3-origin',
'DomainName': 'my-bucket.s3.amazonaws.com',
'S3OriginConfig': {
'OriginAccessIdentity': ''
}
}
]
},
'DefaultCacheBehavior': {
'TargetOriginId': 's3-origin',
'ViewerProtocolPolicy': 'redirect-to-https',
'AllowedMethods': {
'Quantity': 2,
'Items': ['GET', 'HEAD']
},
'Compress': True,
'MinTTL': 0,
'DefaultTTL': 86400, # 24 hours
'MaxTTL': 31536000, # 1 year
'ForwardedValues': {
'QueryString': False,
'Cookies': {'Forward': 'none'}
}
},
'Enabled': True
}
)
Global Accelerator uses AWS global network:
## Create Global Accelerator
globalaccelerator = boto3.client('globalaccelerator')
accelerator = globalaccelerator.create_accelerator(
Name='my-accelerator',
IpAddressType='IPV4',
Enabled=True
)
## Add listener
listener = globalaccelerator.create_listener(
AcceleratorArn=accelerator['Accelerator']['AcceleratorArn'],
PortRanges=[{'FromPort': 80, 'ToPort': 80}],
Protocol='TCP'
)
Global Accelerator improves performance by:
- Routing traffic over AWS backbone (not public internet)
- Providing static anycast IPs
- Automatic failover to healthy endpoints
ββββββββββββββββββββββββββββββββββββββββββββββββββ
β CONTENT DELIVERY DECISION TREE β
ββββββββββββββββββββββββββββββββββββββββββββββββββ
Need caching?
β
βββββββββ΄βββββββββ
β β
YES NO
β β
β β
Static content? Real-time?
β β
βββββ΄ββββ βββββ΄ββββ
YES NO YES NO
β β β β
β β β β
CloudFront β Global Direct
β Accelerator to
API β origin
Gateway
(caching)
Practical Examples with Real-World Scenarios π§
Example 1: E-commerce Platform Cost Optimization
Scenario: An e-commerce company runs 50 m5.large EC2 instances 24/7 for their web tier. Monthly cost is $3,650.
Analysis:
- Traffic peaks during business hours (9am-9pm)
- Average CPU utilization: 25%
- Peak CPU: 60%
Optimization Strategy:
## Step 1: Analyze current usage
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
## Get average CPU across all instances
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
Dimensions=[{'Name': 'AutoScalingGroupName', 'Value': 'web-tier-asg'}],
StartTime=datetime.utcnow() - timedelta(days=30),
EndTime=datetime.utcnow(),
Period=3600,
Statistics=['Average', 'Maximum']
)
## Step 2: Implement changes
ec2 = boto3.client('ec2')
autoscaling = boto3.client('autoscaling')
## Purchase Reserved Instances for baseline (20 instances)
ec2.purchase_reserved_instances_offering(
InstanceCount=20,
ReservedInstancesOfferingId='offering-12345678', # m5.large, 1-year, partial upfront
)
## Configure Auto Scaling for variable load (10-40 instances)
autoscaling.put_scaling_policy(
AutoScalingGroupName='web-tier-asg',
PolicyName='target-tracking-scaling',
PolicyType='TargetTrackingScaling',
TargetTrackingConfiguration={
'PredefinedMetricSpecification': {
'PredefinedMetricType': 'ASGAverageCPUUtilization'
},
'TargetValue': 50.0
}
)
## Step 3: Add Spot Instances for additional capacity
autoscaling.create_auto_scaling_group(
AutoScalingGroupName='web-tier-spot-asg',
MixedInstancesPolicy={
'InstancesDistribution': {
'OnDemandBaseCapacity': 0,
'OnDemandPercentageAboveBaseCapacity': 20, # 20% on-demand, 80% spot
'SpotAllocationStrategy': 'capacity-optimized'
},
'LaunchTemplate': {
'LaunchTemplateSpecification': {
'LaunchTemplateId': 'lt-0123456789abcdef0',
'Version': '$Latest'
},
'Overrides': [
{'InstanceType': 'm5.large'},
{'InstanceType': 'm5a.large'}, # Alternative for better spot availability
{'InstanceType': 'm5n.large'}
]
}
},
MinSize=10,
MaxSize=40,
DesiredCapacity=20,
VPCZoneIdentifier='subnet-1,subnet-2,subnet-3'
)
Results:
- 20 Reserved Instances (baseline): $1,460/month (60% savings)
- 10-30 Auto Scaling (mix of on-demand/spot): ~$800/month average
- Total cost: ~$2,260/month (38% reduction)
- Annual savings: $16,680
Example 2: Media Processing Pipeline Performance
Scenario: Video transcoding pipeline processes 1,000 videos/day. Current processing time: 4 hours average per batch.
Bottleneck Analysis:
- Using c5.2xlarge instances (8 vCPU)
- Single-threaded processing
- No parallelization
Optimization:
## Implement parallel processing with AWS Batch
import boto3
import json
batch = boto3.client('batch')
s3 = boto3.client('s3')
## Define compute environment with Spot instances
compute_env = batch.create_compute_environment(
computeEnvironmentName='video-transcoding-spot',
type='MANAGED',
state='ENABLED',
computeResources={
'type': 'SPOT',
'allocationStrategy': 'SPOT_CAPACITY_OPTIMIZED',
'minvCpus': 0,
'maxvCpus': 256,
'desiredvCpus': 0,
'instanceTypes': ['c5', 'c5n', 'c5a'], # Multiple types for better availability
'subnets': ['subnet-1', 'subnet-2', 'subnet-3'],
'securityGroupIds': ['sg-0123456789abcdef0'],
'instanceRole': 'arn:aws:iam::123456789012:instance-profile/ecsInstanceRole',
'bidPercentage': 70, # Pay up to 70% of on-demand price
'spotIamFleetRole': 'arn:aws:iam::123456789012:role/aws-ec2-spot-fleet-role'
}
)
## Create job queue
job_queue = batch.create_job_queue(
jobQueueName='video-transcoding-queue',
state='ENABLED',
priority=100,
computeEnvironmentOrder=[
{
'order': 1,
'computeEnvironment': 'video-transcoding-spot'
}
]
)
## Define job that processes single video
job_definition = batch.register_job_definition(
jobDefinitionName='transcode-video',
type='container',
containerProperties={
'image': '123456789012.dkr.ecr.us-east-1.amazonaws.com/video-transcoder:latest',
'vcpus': 4,
'memory': 8192,
'command': ['python', 'transcode.py', 'Ref::input_file', 'Ref::output_file'],
'jobRoleArn': 'arn:aws:iam::123456789012:role/BatchJobRole'
},
retryStrategy={
'attempts': 3,
'evaluateOnExit': [
{
'onStatusReason': 'Host EC2*', # Retry on spot interruption
'action': 'RETRY'
}
]
}
)
## Submit jobs in parallel
videos = s3.list_objects_v2(Bucket='input-videos', Prefix='pending/')['Contents']
for video in videos:
batch.submit_job(
jobName=f"transcode-{video['Key'].split('/')[-1]}",
jobQueue='video-transcoding-queue',
jobDefinition='transcode-video',
containerOverrides={
'command': [
'python', 'transcode.py',
f"s3://input-videos/{video['Key']}",
f"s3://output-videos/{video['Key']}"
]
}
)
print(f"Submitted {len(videos)} transcoding jobs")
Results:
- Processing time: 4 hours β 30 minutes (8x faster)
- Cost: 85% lower using Spot instances
- Automatic scaling: 0 instances when idle
- Fault tolerance: Automatic retry on spot interruption
Example 3: Database Performance Tuning
Scenario: RDS PostgreSQL database experiencing slow queries. Average response time: 500ms, peak: 2 seconds.
Investigation:
## Enable Performance Insights and analyze
import boto3
import json
pi = boto3.client('pi') # Performance Insights
## Get top SQL queries by load
response = pi.get_resource_metrics(
ServiceType='RDS',
Identifier='db-ABCDEFGHIJKLMNOPQRS',
MetricQueries=[
{
'Metric': 'db.load.avg',
'GroupBy': {
'Group': 'db.sql'
}
}
],
StartTime='2024-01-15T00:00:00Z',
EndTime='2024-01-15T23:59:59Z',
PeriodInSeconds=3600
)
## Identify slow queries
for metric in response['MetricList']:
print(f"Query: {metric['Key']['Dimensions']['db.sql.statement'][:100]}...")
print(f"Average Load: {sum(metric['DataPoints']) / len(metric['DataPoints']):.2f}")
print("---")
Optimization Steps:
-- Step 1: Add missing indexes (identified from Performance Insights)
CREATE INDEX idx_orders_customer_date
ON orders(customer_id, order_date DESC);
CREATE INDEX idx_products_category
ON products(category_id)
WHERE active = true;
-- Step 2: Optimize slow query
-- Before (2 seconds):
SELECT o.*, c.name, p.title
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN order_items oi ON oi.order_id = o.id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date > NOW() - INTERVAL '30 days';
-- After (50ms): Use CTEs and limit joins
WITH recent_orders AS (
SELECT * FROM orders
WHERE order_date > NOW() - INTERVAL '30 days'
AND customer_id IN (SELECT id FROM customers WHERE active = true)
)
SELECT ro.*, c.name
FROM recent_orders ro
JOIN customers c ON ro.customer_id = c.id;
## Step 3: Implement read replica for reporting queries
import boto3
rds = boto3.client('rds')
## Create read replica
replica = rds.create_db_instance_read_replica(
DBInstanceIdentifier='mydb-read-replica',
SourceDBInstanceIdentifier='mydb-primary',
DBInstanceClass='db.r5.xlarge', # Memory-optimized for caching
PubliclyAccessible=False,
Tags=[
{'Key': 'Purpose', 'Value': 'reporting'},
{'Key': 'Environment', 'Value': 'production'}
]
)
## Update application to route read queries to replica
## In application config:
## DB_WRITE_ENDPOINT = 'mydb-primary.abcdef.us-east-1.rds.amazonaws.com'
## DB_READ_ENDPOINT = 'mydb-read-replica.abcdef.us-east-1.rds.amazonaws.com'
Results:
- Average response time: 500ms β 80ms (6.25x faster)
- Peak response time: 2s β 200ms (10x faster)
- Primary database CPU: 75% β 40%
- Zero application changes needed for read routing
Example 4: Serverless Architecture Cost Comparison
Scenario: API backend serving 10 million requests/month.
Option A: EC2-based (Current)
## Always-on instances
## 3x m5.large instances behind ALB
## Cost: $219/month (instances) + $23/month (ALB) = $242/month
Option B: Serverless (Proposed)
## AWS Lambda + API Gateway
import json
def lambda_handler(event, context):
# Process request
user_id = event['pathParameters']['userId']
# Query DynamoDB
import boto3
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('Users')
response = table.get_item(Key={'userId': user_id})
return {
'statusCode': 200,
'headers': {'Content-Type': 'application/json'},
'body': json.dumps(response.get('Item', {}))
}
## Cost calculation:
## Lambda: 10M requests Γ $0.20/1M = $2.00
## Lambda compute: 10M Γ 200ms Γ $0.0000166667/GB-sec = $33.33 (1GB memory)
## API Gateway: 10M requests Γ $3.50/1M = $35.00
## DynamoDB: 10M reads Γ $0.25/1M (on-demand) = $2.50
## Total: $72.83/month (70% savings)
Cost Comparison:
| Component | EC2 Option | Serverless Option |
|---|---|---|
| Compute | $219/month | $35.33/month |
| Load Balancer / API Gateway | $23/month | $35/month |
| Database | RDS: $115/month | DynamoDB: $2.50/month |
| Total Monthly | $357 | $72.83 |
| Annual Savings | $3,410 | |
Additional Benefits:
- Zero server management
- Automatic scaling
- Pay only for actual usage
- Built-in high availability
β οΈ Consideration: Serverless works best for sporadic or variable workloads. For consistent high-volume traffic (>50M requests/month), EC2 with Reserved Instances may be more cost-effective.
Common Mistakes & How to Avoid Them β οΈ
Mistake 1: Not Using Tags for Cost Allocation
β Wrong:
## Launch instance without tags
ec2.run_instances(
ImageId='ami-0abcdef1234567890',
InstanceType='t3.medium',
MinCount=1,
MaxCount=1
)
β Right:
## Launch with comprehensive tags
ec2.run_instances(
ImageId='ami-0abcdef1234567890',
InstanceType='t3.medium',
MinCount=1,
MaxCount=1,
TagSpecifications=[
{
'ResourceType': 'instance',
'Tags': [
{'Key': 'Name', 'Value': 'web-server-01'},
{'Key': 'Environment', 'Value': 'production'},
{'Key': 'Project', 'Value': 'customer-portal'},
{'Key': 'CostCenter', 'Value': 'engineering'},
{'Key': 'Owner', 'Value': 'alice@example.com'},
{'Key': 'AutoShutdown', 'Value': 'false'}
]
}
]
)
Why it matters: Without tags, you can't track spending by team, project, or environment. Cost allocation becomes impossible.
Mistake 2: Leaving Resources Running 24/7
β Wrong: Development and test environments run continuously, even nights and weekends.
β Right:
## Lambda function to stop dev/test instances nights and weekends
import boto3
from datetime import datetime
def lambda_handler(event, context):
ec2 = boto3.client('ec2')
# Get current time
now = datetime.now()
hour = now.hour
weekday = now.weekday() # 0=Monday, 6=Sunday
# Define schedule: Stop 7PM-7AM weekdays, all day weekends
should_stop = (
hour < 7 or hour >= 19 # Outside 7AM-7PM
) or (
weekday >= 5 # Weekend (Sat/Sun)
)
if should_stop:
# Find instances tagged for auto-shutdown
instances = ec2.describe_instances(
Filters=[
{'Name': 'tag:AutoShutdown', 'Values': ['true']},
{'Name': 'instance-state-name', 'Values': ['running']}
]
)
instance_ids = []
for reservation in instances['Reservations']:
for instance in reservation['Instances']:
instance_ids.append(instance['InstanceId'])
if instance_ids:
ec2.stop_instances(InstanceIds=instance_ids)
print(f"Stopped {len(instance_ids)} instances")
return {'statusCode': 200}
## Schedule with EventBridge: cron(0 7,19 * * ? *)
## Runs at 7AM and 7PM daily
Savings: ~70% reduction for dev/test environments (running only 50 hours/week vs 168 hours/week).
Mistake 3: Not Monitoring Burst Credits on T-Instances
β Wrong: Using t3.medium for application that consistently needs high CPU, leading to credit exhaustion and throttling.
β Right:
## Create CloudWatch alarm for low CPU credits
import boto3
cloudwatch = boto3.client('cloudwatch')
cloudwatch.put_metric_alarm(
AlarmName='low-cpu-credits-web-01',
ComparisonOperator='LessThanThreshold',
EvaluationPeriods=2,
MetricName='CPUCreditBalance',
Namespace='AWS/EC2',
Period=300,
Statistic='Average',
Threshold=100.0, # Alert when credits drop below 100
ActionsEnabled=True,
AlarmActions=['arn:aws:sns:us-east-1:123456789012:ops-alerts'],
AlarmDescription='Alert when T-instance running low on CPU credits',
Dimensions=[
{'Name': 'InstanceId', 'Value': 'i-0123456789abcdef0'}
]
)
Better Solution: Switch to M-series for consistent CPU needs, or enable T3 Unlimited mode:
aws ec2 modify-instance-credit-specification \
--instance-credit-specification "InstanceId=i-0123456789abcdef0,CpuCredits=unlimited"
Mistake 4: Using On-Demand for Predictable Workloads
β Wrong:
## Running database on on-demand pricing
## db.r5.xlarge on-demand: $0.252/hour = $183.96/month
β Right:
## Purchase Reserved Instance for database
## db.r5.xlarge 1-year partial upfront: $0.155/hour = $113.15/month
## Savings: $70.81/month ($850/year)
rds = boto3.client('rds')
## Purchase RDS Reserved Instance
response = rds.purchase_reserved_db_instances_offering(
ReservedDBInstancesOfferingId='offering-12345678',
ReservedDBInstanceId='my-reserved-db',
DBInstanceCount=1
)
Mistake 5: Not Using S3 Lifecycle Policies
β Wrong: Storing all logs in S3 Standard forever.
β Right:
{
"Rules": [
{
"Id": "intelligent-log-management",
"Status": "Enabled",
"Filter": {"Prefix": "logs/"},
"Transitions": [
{"Days": 30, "StorageClass": "STANDARD_IA"},
{"Days": 90, "StorageClass": "GLACIER_IR"},
{"Days": 365, "StorageClass": "DEEP_ARCHIVE"}
],
"Expiration": {"Days": 2555},
"NoncurrentVersionExpiration": {"NoncurrentDays": 30}
}
]
}
Savings Example:
- 1TB logs/month, retained for 7 years
- Standard: $0.023/GB = $23/month Γ 84 months = $1,932
- With lifecycle: $23 (month 1) + $12.80 (months 2-3) + $4 (months 4-12) + $1 (years 2-7) = $340 total (82% savings)
Mistake 6: Ignoring Network Transfer Costs
β Wrong: Transferring data between regions unnecessarily.
## Application in us-east-1 reading from S3 in eu-west-1
## 10TB/month Γ $0.02/GB = $204/month in data transfer fees
β Right:
## Use S3 Cross-Region Replication to keep data local
s3 = boto3.client('s3')
s3.put_bucket_replication(
Bucket='my-source-bucket',
ReplicationConfiguration={
'Role': 'arn:aws:iam::123456789012:role/s3-replication-role',
'Rules': [
{
'ID': 'replicate-to-us-east-1',
'Status': 'Enabled',
'Priority': 1,
'Filter': {'Prefix': ''},
'Destination': {
'Bucket': 'arn:aws:s3:::my-replica-bucket-us-east-1',
'StorageClass': 'STANDARD_IA' # Use cheaper storage class
}
}
]
}
)
## Application reads from local replica
## One-time replication cost: 10TB Γ $0.02/GB = $204
## Ongoing monthly transfer: $0 (local reads)
Key Takeaways π―
π Quick Reference Card
Cost Optimization Priorities:
- Right-size first - 30-50% savings typically available
- Use Reserved Instances/Savings Plans - For predictable workloads (up to 72% off)
- Implement Auto Scaling - Pay only for what you need
- Leverage Spot Instances - For fault-tolerant workloads (up to 90% off)
- Apply S3 Lifecycle Policies - Automatically tier storage (up to 95% off)
- Tag everything - Enable cost allocation and tracking
- Set up billing alerts - Catch anomalies early
Performance Optimization Checklist:
- β Use appropriate instance family (C/R/M/I/P)
- β Enable Enhanced Networking for latency-sensitive apps
- β Implement caching layers (ElastiCache, DAX, CloudFront)
- β Use read replicas for read-heavy databases
- β Deploy resources in multiple AZs for high availability
- β Enable CloudWatch detailed monitoring
- β Use Placement Groups for tightly coupled workloads
- β Optimize database queries and add indexes
Cost & Performance Tools:
| Tool | Purpose | Key Metric |
|---|---|---|
| Cost Explorer | Spending analysis | Monthly cost trends |
| AWS Budgets | Cost alerts | Budget vs actual |
| Compute Optimizer | Right-sizing | Utilization % |
| Trusted Advisor | Best practices | Checks passed/failed |
| CloudWatch | Performance monitoring | Resource metrics |
| Performance Insights | Database tuning | Query load |
Golden Rules:
- Measure before optimizing - Get baseline metrics
- One change at a time - Isolate impact
- Automate everything - Reduce human error
- Review monthly - Costs and performance drift
- Test in non-prod first - Validate changes safely
π§ Memory Device - CRAPΒ²: Cache aggressively, Right-size resources, Automate scaling, Plan for reserved capacity, Performance test continuously.
π‘ Final Tip: The AWS Well-Architected Tool provides free assessments. Use it quarterly to identify cost and performance optimization opportunities.
π Further Study
AWS Well-Architected Framework - Cost Optimization Pillar
https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.htmlAWS Cost Optimization Best Practices
https://aws.amazon.com/pricing/cost-optimization/Amazon CloudWatch User Guide - Performance Metrics
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html
Congratulations! π You now understand the core principles of AWS cost and performance engineering. Practice implementing these techniques in your own AWS environment, starting with the low-hanging fruit: tagging resources, setting up billing alerts, and analyzing your Cost Explorer data. Remember: optimization is an ongoing process, not a one-time project.