You are viewing a preview of this lesson. Sign in to start learning
Back to Mastering AWS

Storage & Data Services

Master S3, databases, caching, and data architecture patterns for performance and cost efficiency

AWS Storage and Data Services

Master AWS storage solutions with free flashcards and spaced repetition practice. This lesson covers Amazon S3, EBS volumes, database services like RDS and DynamoDB, data transfer tools, and storage optimization strategiesβ€”essential concepts for building scalable cloud architectures and passing AWS certification exams.

Welcome to AWS Storage & Data Services πŸ’Ύ

AWS provides a comprehensive suite of storage and database services designed to handle everything from object storage to fully managed relational databases. Understanding when to use each service is crucial for building efficient, cost-effective cloud solutions. Whether you're storing petabytes of data, running high-performance databases, or transferring massive datasets, AWS has a service optimized for your needs.

Core Concepts: Storage Services πŸ—„οΈ

Amazon S3 (Simple Storage Service) πŸͺ£

Amazon S3 is AWS's flagship object storage service, offering industry-leading scalability, durability, and performance. S3 stores data as objects (files) within buckets (containers).

Key Features:

  • Durability: 99.999999999% (11 nines) - designed to sustain the loss of data in two facilities
  • Scalability: Store unlimited objects, each up to 5TB
  • Storage Classes: Optimize costs based on access patterns
  • Versioning: Keep multiple versions of objects
  • Lifecycle Policies: Automatically transition objects between storage classes
Storage ClassUse CaseRetrieval TimeCost
S3 StandardFrequently accessed dataMillisecondsHighest
S3 Intelligent-TieringUnknown/changing access patternsMillisecondsAuto-optimized
S3 Standard-IAInfrequent accessMillisecondsLower
S3 One Zone-IARecreatable, infrequent dataMillisecondsLower
S3 Glacier Instant RetrievalArchive, quarterly accessMillisecondsVery low
S3 Glacier Flexible RetrievalArchive, annual accessMinutes-hoursExtremely low
S3 Glacier Deep ArchiveLong-term archive (7-10 years)12-48 hoursLowest

πŸ’‘ Pro Tip: Use S3 Intelligent-Tiering when you can't predict access patternsβ€”it automatically moves objects between tiers based on usage.

S3 Security & Access Control:

  • Bucket Policies: JSON-based resource policies
  • IAM Policies: User/role-based permissions
  • Access Control Lists (ACLs): Legacy granular permissions
  • Encryption: Server-side (SSE-S3, SSE-KMS, SSE-C) or client-side
  • Pre-signed URLs: Temporary access to private objects
import boto3
from botocore.exceptions import ClientError

## Create S3 client
s3 = boto3.client('s3')

## Upload file to S3
try:
    s3.upload_file(
        'local_file.txt',
        'my-bucket',
        'remote_file.txt'
    )
    print("Upload successful")
except ClientError as e:
    print(f"Error: {e}")

## Download file from S3
s3.download_file(
    'my-bucket',
    'remote_file.txt',
    'downloaded_file.txt'
)

Amazon EBS (Elastic Block Store) πŸ’Ώ

Amazon EBS provides persistent block-level storage volumes for EC2 instances. Think of EBS as virtual hard drives that can be attached to your virtual machines.

EBS Volume Types:

TypeNameUse CaseIOPSThroughput
gp3General Purpose SSDBoot volumes, low-latency apps16,0001,000 MB/s
gp2General Purpose SSDLegacy general purpose16,000250 MB/s
io2Provisioned IOPS SSDMission-critical, databases64,0001,000 MB/s
io2 Block ExpressHighest performance SSDLargest databases256,0004,000 MB/s
st1Throughput Optimized HDDBig data, data warehouses500500 MB/s
sc1Cold HDDInfrequently accessed data250250 MB/s

Key EBS Features:

  • Snapshots: Point-in-time backups stored in S3
  • Encryption: AES-256 encryption at rest
  • Multi-Attach: Attach io2 volumes to multiple instances
  • Elastic Volumes: Resize, change type without downtime
## AWS CLI: Create EBS volume
aws ec2 create-volume \
    --volume-type gp3 \
    --size 100 \
    --availability-zone us-east-1a \
    --iops 3000 \
    --throughput 125

## Attach volume to instance
aws ec2 attach-volume \
    --volume-id vol-1234567890abcdef0 \
    --instance-id i-1234567890abcdef0 \
    --device /dev/sdf

## Create snapshot
aws ec2 create-snapshot \
    --volume-id vol-1234567890abcdef0 \
    --description "Backup before upgrade"

⚠️ Common Mistake: EBS volumes are availability zone-specific. You cannot attach an EBS volume in us-east-1a to an instance in us-east-1b. Use snapshots to move data between AZs.

Amazon EFS (Elastic File System) πŸ“

Amazon EFS provides a scalable, fully managed NFS (Network File System) that can be mounted by multiple EC2 instances simultaneously.

EFS vs EBS:

EBS (Block Storage)          EFS (File Storage)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   EC2-1     β”‚              β”‚   EC2-1     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”   β”‚              β”‚             β”‚
β”‚  β”‚ EBS  β”‚   β”‚              β”‚      β•²      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜   β”‚              β”‚       β•²     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β”‚        β•²    β”‚
                             β”‚      β”Œβ”€β”€β”΄β”€β”€β”β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”              β”‚      β”‚ EFS β”‚β”‚
β”‚   EC2-2     β”‚              β”‚      β””β”€β”€β”¬β”€β”€β”˜β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”   β”‚              β”‚        β•±    β”‚
β”‚  β”‚ EBS  β”‚   β”‚              β”‚       β•±     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”˜   β”‚              β”‚      β•±      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
One-to-one                   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚   EC2-2     β”‚
                             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             Many-to-one

EFS Storage Classes:

  • Standard: Frequently accessed files
  • Infrequent Access (IA): Cost-optimized for files not accessed daily
  • Lifecycle Management: Automatically move files to IA after N days
import boto3

## Create EFS file system
efs = boto3.client('efs')

response = efs.create_file_system(
    PerformanceMode='generalPurpose',
    ThroughputMode='elastic',
    Encrypted=True,
    Tags=[
        {'Key': 'Name', 'Value': 'shared-storage'},
    ]
)

fs_id = response['FileSystemId']
print(f"Created file system: {fs_id}")

Core Concepts: Database Services πŸ—ƒοΈ

Amazon RDS (Relational Database Service) πŸ”—

Amazon RDS manages relational databases, handling backups, patching, scaling, and replication automatically.

Supported Engines:

  • Amazon Aurora (MySQL & PostgreSQL compatible)
  • MySQL
  • PostgreSQL
  • MariaDB
  • Oracle Database
  • Microsoft SQL Server

Key Features:

  • Automated Backups: Point-in-time recovery up to 35 days
  • Multi-AZ Deployments: Synchronous replication for high availability
  • Read Replicas: Asynchronous replication for read scaling
  • Automatic Failover: Multi-AZ automatically fails over in minutes
RDS MULTI-AZ ARCHITECTURE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Availability Zone A         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Primary RDS Instance        β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”‚
β”‚  β”‚  β”‚    MySQL/PostgreSQL    β”‚  β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚ Synchronous
                  β”‚ Replication
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Availability Zone B         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚  β”‚  Standby RDS Instance        β”‚   β”‚
β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚   β”‚
β”‚  β”‚  β”‚   (Automatic Failover) β”‚  β”‚   β”‚
β”‚  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚   β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Amazon Aurora 🌟

Aurora is AWS's cloud-native database, offering:

  • 5x faster than standard MySQL
  • 3x faster than standard PostgreSQL
  • Up to 15 read replicas with <10ms replica lag
  • Automatic storage scaling up to 128 TB
  • Global Database: Cross-region replication with <1 second latency
import boto3

rds = boto3.client('rds')

## Create RDS MySQL instance
response = rds.create_db_instance(
    DBInstanceIdentifier='mydb-instance',
    DBInstanceClass='db.t3.micro',
    Engine='mysql',
    MasterUsername='admin',
    MasterUserPassword='SecurePassword123!',
    AllocatedStorage=20,
    StorageType='gp3',
    MultiAZ=True,
    BackupRetentionPeriod=7,
    PubliclyAccessible=False,
    VpcSecurityGroupIds=['sg-12345678'],
    Tags=[
        {'Key': 'Environment', 'Value': 'production'},
    ]
)

print(f"Creating database: {response['DBInstance']['DBInstanceIdentifier']}")

Amazon DynamoDB ⚑

DynamoDB is a fully managed NoSQL database offering single-digit millisecond performance at any scale.

Key Concepts:

  • Tables: Collection of items (rows)
  • Items: Collection of attributes (columns)
  • Primary Key: Partition key (required) + Sort key (optional)
  • Secondary Indexes: Query on non-key attributes

DynamoDB Capacity Modes:

ModeUse CasePricing
On-DemandUnpredictable workloads, new appsPer-request pricing
ProvisionedPredictable trafficPay for provisioned capacity

DynamoDB Features:

  • Global Tables: Multi-region, multi-active replication
  • DynamoDB Streams: Capture item-level changes
  • Time to Live (TTL): Automatically delete expired items
  • Point-in-Time Recovery: Restore to any second in last 35 days
  • DAX (DynamoDB Accelerator): In-memory caching, microsecond latency
import boto3
from boto3.dynamodb.conditions import Key

dynamodb = boto3.resource('dynamodb')

## Create table
table = dynamodb.create_table(
    TableName='Users',
    KeySchema=[
        {'AttributeName': 'user_id', 'KeyType': 'HASH'},  # Partition key
        {'AttributeName': 'timestamp', 'KeyType': 'RANGE'}  # Sort key
    ],
    AttributeDefinitions=[
        {'AttributeName': 'user_id', 'AttributeType': 'S'},
        {'AttributeName': 'timestamp', 'AttributeType': 'N'}
    ],
    BillingMode='PAY_PER_REQUEST'
)

## Put item
table.put_item(
    Item={
        'user_id': 'user123',
        'timestamp': 1234567890,
        'name': 'John Doe',
        'email': 'john@example.com'
    }
)

## Query items
response = table.query(
    KeyConditionExpression=Key('user_id').eq('user123')
)

for item in response['Items']:
    print(item)

Amazon Redshift πŸ“Š

Redshift is AWS's fully managed data warehouse service optimized for online analytical processing (OLAP) and business intelligence workloads.

Key Features:

  • Columnar Storage: Optimized for analytical queries
  • Massively Parallel Processing (MPP): Distributes queries across nodes
  • Redshift Spectrum: Query data directly in S3
  • Concurrency Scaling: Automatically adds capacity for concurrent queries

🧠 Memory Device: REDshift = REDuced query time for Reporting, Extracting, Data warehouse tasks

Amazon ElastiCache πŸš€

ElastiCache provides fully managed in-memory caching with Redis or Memcached.

FeatureRedisMemcached
Data StructuresStrings, Lists, Sets, Sorted Sets, HashesStrings only
PersistenceYes (snapshots, AOF)No
ReplicationYes (multi-AZ)No
Multi-threadingNoYes
Pub/SubYesNo
import boto3

elasticache = boto3.client('elasticache')

## Create Redis cluster
response = elasticache.create_cache_cluster(
    CacheClusterId='my-redis-cluster',
    CacheNodeType='cache.t3.micro',
    Engine='redis',
    NumCacheNodes=1,
    EngineVersion='7.0',
    Port=6379,
    CacheSubnetGroupName='my-subnet-group',
    SecurityGroupIds=['sg-12345678']
)

Core Concepts: Data Transfer & Migration 🚚

AWS Snow Family ❄️

Physical data transfer devices for moving massive amounts of data into and out of AWS.

DeviceStorageUse CaseData Transfer
Snowcone8-14 TBEdge computing, small transfersOnline/Offline
Snowball Edge Storage Optimized80 TBLarge migrations, edge storageOffline
Snowball Edge Compute Optimized42 TBEdge computing, MLOffline
Snowmobile100 PBExabyte-scale transfersOffline (truck)
SNOW FAMILY SIZE COMPARISON

   Snowcone          Snowball Edge       Snowmobile
   β”Œβ”€β”€β”€β”€β”€β”           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”          πŸš›
   β”‚ πŸ“¦  β”‚           β”‚  πŸ“¦πŸ“¦πŸ“¦   β”‚     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”
   β””β”€β”€β”€β”€β”€β”˜           β”‚  πŸ“¦πŸ“¦πŸ“¦   β”‚     β”‚πŸ“¦πŸ“¦πŸ“¦πŸ“¦πŸ“¦β”‚
    8 TB             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚πŸ“¦πŸ“¦πŸ“¦πŸ“¦πŸ“¦β”‚
  (portable)            80 TB          β”‚πŸ“¦πŸ“¦πŸ“¦πŸ“¦πŸ“¦β”‚
                    (ruggedized)      β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                         100 PB
                                      (semi-truck)

πŸ’‘ Pro Tip: Use the Snow Family when transferring more than 10TB or when network bandwidth is limited. Rule of thumb: If it takes longer to transfer over the network than to ship a device, use Snow.

AWS DataSync πŸ”„

DataSync automates and accelerates data transfer between on-premises storage and AWS.

Key Features:

  • Up to 10x faster than open-source tools
  • Automated scheduling: Set up recurring transfers
  • Data validation: Verifies data integrity
  • Bandwidth throttling: Control network impact

Supported Destinations:

  • Amazon S3
  • Amazon EFS
  • Amazon FSx for Windows File Server
  • Amazon FSx for Lustre
import boto3

datasync = boto3.client('datasync')

## Create DataSync task
response = datasync.create_task(
    SourceLocationArn='arn:aws:datasync:us-east-1:123456789012:location/loc-abcdef',
    DestinationLocationArn='arn:aws:datasync:us-east-1:123456789012:location/loc-123456',
    CloudWatchLogGroupArn='arn:aws:logs:us-east-1:123456789012:log-group:/aws/datasync',
    Name='OnPremToS3Transfer',
    Options={
        'VerifyMode': 'POINT_IN_TIME_CONSISTENT',
        'OverwriteMode': 'ALWAYS',
        'TransferMode': 'CHANGED'
    }
)

AWS Storage Gateway πŸŒ‰

Storage Gateway provides hybrid cloud storage integration, connecting on-premises environments with AWS storage.

Gateway Types:

TypeProtocolUse CaseCaching
File GatewayNFS, SMBFile shares backed by S3Yes
Volume Gateway (Stored)iSCSIPrimary data on-premises, async backup to S3Full local
Volume Gateway (Cached)iSCSIPrimary data in S3, frequently accessed cached locallyPartial local
Tape GatewayiSCSI VTLReplace physical tape backupsN/A
STORAGE GATEWAY ARCHITECTURE

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    On-Premises Data Center      β”‚
β”‚                                 β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚   App    │──→│  Storage   β”‚ β”‚
β”‚  β”‚ Servers  β”‚   β”‚  Gateway   β”‚ β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜ β”‚
β”‚                        β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚ HTTPS
                         β”‚
          β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
          β”‚         AWS Cloud           β”‚
          β”‚                             β”‚
          β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
          β”‚  β”‚    Amazon S3         β”‚   β”‚
          β”‚  β”‚  (Data Storage)      β”‚   β”‚
          β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
          β”‚                             β”‚
          β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Example 1: Building a Three-Tier Web Application Storage Architecture πŸ—οΈ

Scenario: You're building a photo-sharing application that needs to store user uploads, serve static content, cache frequently accessed data, and maintain user metadata.

Architecture Design:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Users (Web/Mobile)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        CloudFront (CDN) + S3 Static Hosting     β”‚  ← Static assets
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Application Load Balancer            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
             β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         EC2 Auto Scaling Group                  β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚  β”‚ EC2  β”‚  β”‚ EC2  β”‚  β”‚ EC2  β”‚                  β”‚
β”‚  β””β”€β”€β”€β”¬β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”˜  β””β”€β”€β”€β”¬β”€β”€β”˜                  β”‚
β””β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚         β”‚         β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β†’ ElastiCache (Redis)  ← Session storage
       β”‚         β”‚         β”‚
       β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β†’ S3 (Standard)        ← User uploads
       β”‚         β”‚         β”‚
       └─────────┴─────────┴────→ RDS (Multi-AZ)       ← User metadata

Storage Component Breakdown:

  1. S3 Standard for user photo uploads:
import boto3
import uuid

s3 = boto3.client('s3')

def upload_photo(file_data, user_id):
    photo_id = str(uuid.uuid4())
    key = f"photos/{user_id}/{photo_id}.jpg"
    
    s3.put_object(
        Bucket='photo-app-uploads',
        Key=key,
        Body=file_data,
        ContentType='image/jpeg',
        ServerSideEncryption='AES256',
        StorageClass='STANDARD'
    )
    
    return f"https://photo-app-uploads.s3.amazonaws.com/{key}"
  1. S3 Lifecycle Policy to optimize costs:
lifecycle_config = {
    'Rules': [
        {
            'Id': 'MoveOldPhotosToIA',
            'Status': 'Enabled',
            'Transitions': [
                {
                    'Days': 90,
                    'StorageClass': 'STANDARD_IA'
                },
                {
                    'Days': 180,
                    'StorageClass': 'GLACIER_INSTANT_RETRIEVAL'
                }
            ],
            'Filter': {'Prefix': 'photos/'}
        }
    ]
}

s3.put_bucket_lifecycle_configuration(
    Bucket='photo-app-uploads',
    LifecycleConfiguration=lifecycle_config
)
  1. ElastiCache Redis for session management and frequently accessed metadata:
import redis
import json

## Connect to ElastiCache Redis
redis_client = redis.Redis(
    host='my-cluster.cache.amazonaws.com',
    port=6379,
    decode_responses=True
)

def cache_user_profile(user_id, profile_data):
    key = f"user:{user_id}:profile"
    redis_client.setex(
        key,
        3600,  # TTL: 1 hour
        json.dumps(profile_data)
    )

def get_cached_profile(user_id):
    key = f"user:{user_id}:profile"
    cached = redis_client.get(key)
    return json.loads(cached) if cached else None
  1. RDS MySQL for structured user data with read replicas:
import pymysql

## Primary database connection
primary_db = pymysql.connect(
    host='mydb.cluster-abc123.us-east-1.rds.amazonaws.com',
    user='admin',
    password='SecurePass123!',
    database='photoapp'
)

## Read replica connection for queries
replica_db = pymysql.connect(
    host='mydb.cluster-ro-abc123.us-east-1.rds.amazonaws.com',
    user='admin',
    password='SecurePass123!',
    database='photoapp'
)

def get_user_photos(user_id):
    cursor = replica_db.cursor()
    cursor.execute(
        "SELECT photo_id, s3_key, created_at FROM photos WHERE user_id = %s ORDER BY created_at DESC",
        (user_id,)
    )
    return cursor.fetchall()

Cost Optimization Strategy:

  • Photos older than 90 days β†’ S3 Standard-IA (50% cost reduction)
  • Photos older than 180 days β†’ Glacier Instant Retrieval (68% cost reduction)
  • Cache frequently accessed data in Redis (reduce database load)
  • Use read replicas for read-heavy operations

Example 2: Data Migration from On-Premises to AWS πŸ“¦

Scenario: Migrate 50TB of on-premises data to AWS, with ongoing synchronization during the migration period.

Migration Strategy:

MIGRATION PHASES

Phase 1: Bulk Transfer (Snowball)      Phase 2: Sync (DataSync)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   On-Premises Data Center   β”‚        β”‚   On-Premises (Active)  β”‚
β”‚                             β”‚        β”‚                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   50 TB Storage    β”‚     β”‚        β”‚  β”‚   New Data     β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β”‚           β”‚                 β”‚        β”‚           β”‚            β”‚
β”‚           β–Ό                 β”‚        β”‚           β–Ό            β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚  Snowball Device   β”‚     β”‚        β”‚  β”‚   DataSync     │───→│
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚        β”‚  β”‚    Agent       β”‚    β”‚
β”‚           β”‚                 β”‚        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚ Ship                                  β”‚ HTTPS
            β–Ό                                       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         AWS Cloud           β”‚        β”‚      AWS Cloud          β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚        β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”‚
β”‚  β”‚   Import to S3     β”‚     β”‚        β”‚  β”‚  Sync to S3    β”‚    β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚        β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  (1-2 weeks, offline)                  (Ongoing, online)

Step-by-Step Implementation:

Step 1: Order and configure Snowball

## AWS CLI: Create Snowball job
aws snowball create-job \
    --job-type IMPORT \
    --resources S3Resources=[{BucketArn=arn:aws:s3:::my-migration-bucket}] \
    --address-id ADID1234-5678-90ab-cdef-1234567890ab \
    --shipping-option SECOND_DAY \
    --snowball-capacity-preference T80

Step 2: Copy data to Snowball (on-premises)

## On-premises: Using Snowball client
snowball cp /mnt/data/ s3://my-migration-bucket/ --recursive

Step 3: Ship Snowball back to AWS

  • AWS receives device
  • Data automatically imported to S3
  • Snowball securely erased

Step 4: Set up DataSync for ongoing sync

import boto3

datasync = boto3.client('datasync')

## Create source location (on-premises)
source_response = datasync.create_location_smb(
    ServerHostname='10.0.1.50',
    Subdirectory='/data',
    User='datasync-user',
    Password='SecurePassword123!',
    AgentArns=['arn:aws:datasync:us-east-1:123456789012:agent/agent-abc123']
)

## Create destination location (S3)
dest_response = datasync.create_location_s3(
    S3BucketArn='arn:aws:s3:::my-migration-bucket',
    S3Config={
        'BucketAccessRoleArn': 'arn:aws:iam::123456789012:role/DataSyncS3Role'
    }
)

## Create sync task
task_response = datasync.create_task(
    SourceLocationArn=source_response['LocationArn'],
    DestinationLocationArn=dest_response['LocationArn'],
    Options={
        'VerifyMode': 'POINT_IN_TIME_CONSISTENT',
        'TransferMode': 'CHANGED',
        'PreserveDeletedFiles': 'PRESERVE'
    },
    Schedule={
        'ScheduleExpression': 'rate(1 hour)'
    }
)

print(f"Migration task created: {task_response['TaskArn']}")

Step 5: Monitor and validate

## Check task execution status
executions = datasync.list_task_executions(
    TaskArn=task_response['TaskArn']
)

for execution in executions['TaskExecutions']:
    details = datasync.describe_task_execution(
        TaskExecutionArn=execution['TaskExecutionArn']
    )
    print(f"Status: {details['Status']}")
    print(f"Files transferred: {details['FilesTransferred']}")
    print(f"Bytes transferred: {details['BytesTransferred']}")

Example 3: Serverless Data Processing Pipeline πŸ”„

Scenario: Process uploaded CSV files, transform data, and load into a data warehouse for analytics.

Architecture:

SERVERLESS DATA PIPELINE

  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
  β”‚  User    β”‚
  β”‚  Upload  β”‚
  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   S3 Bucket │────────→│   Lambda     β”‚  S3 Event
β”‚   (raw/)    β”‚  Event  β”‚  (Transform) β”‚  Trigger
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   S3 Bucket  β”‚
                        β”‚ (processed/) β”‚
                        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Lambda     β”‚
                        β”‚   (Load)     β”‚
                        β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
                               β”‚
                               β–Ό
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Redshift   β”‚
                        β”‚ Data Warehouseβ”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Implementation:

Lambda Function 1: Transform CSV

import boto3
import csv
import json
from io import StringIO

s3 = boto3.client('s3')

def lambda_handler(event, context):
    # Get uploaded file details
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # Read CSV from S3
    response = s3.get_object(Bucket=bucket, Key=key)
    csv_content = response['Body'].read().decode('utf-8')
    
    # Transform data
    reader = csv.DictReader(StringIO(csv_content))
    transformed_data = []
    
    for row in reader:
        transformed_row = {
            'user_id': row['id'],
            'email': row['email'].lower(),
            'signup_date': row['created'],
            'is_active': row['status'] == 'active'
        }
        transformed_data.append(transformed_row)
    
    # Write JSON to processed bucket
    output_key = key.replace('raw/', 'processed/').replace('.csv', '.json')
    s3.put_object(
        Bucket='my-data-bucket',
        Key=output_key,
        Body=json.dumps(transformed_data),
        ContentType='application/json'
    )
    
    return {
        'statusCode': 200,
        'body': f'Processed {len(transformed_data)} records'
    }

Lambda Function 2: Load to Redshift

import boto3
import psycopg2
import json

s3 = boto3.client('s3')

def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    key = event['Records'][0]['s3']['object']['key']
    
    # Connect to Redshift
    conn = psycopg2.connect(
        host='my-cluster.abc123.us-east-1.redshift.amazonaws.com',
        port=5439,
        dbname='analytics',
        user='admin',
        password='SecurePass123!'
    )
    cursor = conn.cursor()
    
    # Use COPY command for efficient bulk load
    copy_sql = f"""
        COPY users_staging
        FROM 's3://{bucket}/{key}'
        IAM_ROLE 'arn:aws:iam::123456789012:role/RedshiftCopyRole'
        JSON 'auto'
        TIMEFORMAT 'auto';
    """
    
    cursor.execute(copy_sql)
    conn.commit()
    
    cursor.close()
    conn.close()
    
    return {
        'statusCode': 200,
        'body': 'Data loaded to Redshift'
    }

S3 Event Configuration:

import boto3

s3 = boto3.client('s3')
lambda_client = boto3.client('lambda')

## Grant S3 permission to invoke Lambda
lambda_client.add_permission(
    FunctionName='transform-csv-function',
    StatementId='s3-invoke-permission',
    Action='lambda:InvokeFunction',
    Principal='s3.amazonaws.com',
    SourceArn='arn:aws:s3:::my-data-bucket'
)

## Configure S3 event notification
s3.put_bucket_notification_configuration(
    Bucket='my-data-bucket',
    NotificationConfiguration={
        'LambdaFunctionConfigurations': [
            {
                'LambdaFunctionArn': 'arn:aws:lambda:us-east-1:123456789012:function:transform-csv-function',
                'Events': ['s3:ObjectCreated:*'],
                'Filter': {
                    'Key': {
                        'FilterRules': [
                            {'Name': 'prefix', 'Value': 'raw/'},
                            {'Name': 'suffix', 'Value': '.csv'}
                        ]
                    }
                }
            }
        ]
    }
)

Example 4: Disaster Recovery with Cross-Region Replication 🌍

Scenario: Implement a disaster recovery strategy with RPO (Recovery Point Objective) of 15 minutes and RTO (Recovery Time Objective) of 1 hour.

Architecture:

CROSS-REGION DISASTER RECOVERY

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       PRIMARY REGION (us-east-1)        β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  RDS Primary │───→│ RDS Replica  β”‚  β”‚
β”‚  β”‚  Multi-AZ    β”‚    β”‚  (Read)      β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚         β”‚ Snapshots                     β”‚
β”‚         β–Ό                               β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚  β”‚  S3 Bucket   │←──── Automated       β”‚
β”‚  β”‚  (Primary)   β”‚      Snapshots       β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
          β”‚ CRR (Cross-Region
          β”‚ Replication)
          β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚       BACKUP REGION (us-west-2)         β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚  S3 Bucket   β”‚    β”‚  RDS Standby β”‚  β”‚
β”‚  β”‚  (Replica)   │───→│  (Restored   β”‚  β”‚
β”‚  β”‚              β”‚    β”‚   from snap) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                      β”‚
β”‚  β”‚  DynamoDB    │←──── Global Tables   β”‚
β”‚  β”‚  Replica     β”‚                      β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

S3 Cross-Region Replication:

import boto3

s3 = boto3.client('s3')

## Enable versioning (required for CRR)
s3.put_bucket_versioning(
    Bucket='my-primary-bucket',
    VersioningConfiguration={'Status': 'Enabled'}
)

s3.put_bucket_versioning(
    Bucket='my-backup-bucket',
    VersioningConfiguration={'Status': 'Enabled'}
)

## Configure cross-region replication
replication_config = {
    'Role': 'arn:aws:iam::123456789012:role/S3ReplicationRole',
    'Rules': [
        {
            'ID': 'ReplicateEverything',
            'Status': 'Enabled',
            'Priority': 1,
            'Filter': {},
            'Destination': {
                'Bucket': 'arn:aws:s3:::my-backup-bucket',
                'ReplicationTime': {
                    'Status': 'Enabled',
                    'Time': {'Minutes': 15}
                },
                'Metrics': {
                    'Status': 'Enabled',
                    'EventThreshold': {'Minutes': 15}
                },
                'StorageClass': 'STANDARD_IA'
            },
            'DeleteMarkerReplication': {'Status': 'Enabled'}
        }
    ]
}

s3.put_bucket_replication(
    Bucket='my-primary-bucket',
    ReplicationConfiguration=replication_config
)

DynamoDB Global Tables:

import boto3

dynamodb = boto3.client('dynamodb')

## Create global table
response = dynamodb.create_global_table(
    GlobalTableName='users-global',
    ReplicationGroup=[
        {'RegionName': 'us-east-1'},
        {'RegionName': 'us-west-2'}
    ]
)

print(f"Global table created: {response['GlobalTableDescription']['GlobalTableName']}")

Automated RDS Snapshot Copy:

import boto3

rds = boto3.client('rds', region_name='us-east-1')
rds_backup = boto3.client('rds', region_name='us-west-2')

def copy_rds_snapshot(snapshot_id):
    # Copy snapshot to backup region
    response = rds_backup.copy_db_snapshot(
        SourceDBSnapshotIdentifier=f'arn:aws:rds:us-east-1:123456789012:snapshot:{snapshot_id}',
        TargetDBSnapshotIdentifier=f'{snapshot_id}-backup',
        KmsKeyId='arn:aws:kms:us-west-2:123456789012:key/abcd1234',
        CopyTags=True
    )
    return response['DBSnapshot']['DBSnapshotIdentifier']

## Lambda function triggered by RDS snapshot creation event
def lambda_handler(event, context):
    snapshot_id = event['detail']['SourceIdentifier']
    backup_snapshot = copy_rds_snapshot(snapshot_id)
    print(f"Copied snapshot to backup region: {backup_snapshot}")

Common Mistakes ⚠️

1. Not Understanding Storage Class Transition Rules

❌ Wrong: Transitioning objects from S3 Standard to Glacier Deep Archive in 30 days

## This will fail - minimum 30 days before Glacier, 90 days before Deep Archive
lifecycle_config = {
    'Rules': [{
        'Transitions': [{
            'Days': 30,
            'StorageClass': 'DEEP_ARCHIVE'  # ❌ Too soon!
        }]
    }]
}

βœ… Right: Following minimum storage duration requirements

lifecycle_config = {
    'Rules': [{
        'Transitions': [
            {'Days': 30, 'StorageClass': 'STANDARD_IA'},
            {'Days': 90, 'StorageClass': 'GLACIER_IR'},
            {'Days': 180, 'StorageClass': 'DEEP_ARCHIVE'}
        ]
    }]
}

2. Forgetting EBS Volumes are AZ-Locked

❌ Wrong: Trying to attach EBS volume from different AZ

## Instance in us-east-1a, volume in us-east-1b
aws ec2 attach-volume \
    --volume-id vol-abc123 \
    --instance-id i-def456  # ❌ Will fail!

βœ… Right: Create snapshot, then create volume in target AZ

## Create snapshot
aws ec2 create-snapshot --volume-id vol-abc123

## Create volume in correct AZ from snapshot
aws ec2 create-volume \
    --snapshot-id snap-xyz789 \
    --availability-zone us-east-1a

3. Not Enabling Point-in-Time Recovery for DynamoDB

❌ Wrong: Relying only on on-demand backups

## Manual backups only
dynamodb.create_backup(
    TableName='users',
    BackupName='manual-backup'
)

βœ… Right: Enable continuous backups with PITR

## Enable point-in-time recovery
dynamodb.update_continuous_backups(
    TableName='users',
    PointInTimeRecoverySpecification={
        'PointInTimeRecoveryEnabled': True
    }
)
## Now you can restore to any second in last 35 days

4. Using Wrong Read Consistency for DynamoDB

❌ Wrong: Using eventually consistent reads when strong consistency required

## Eventually consistent read (may return stale data)
response = table.get_item(
    Key={'user_id': 'user123'}
    # ConsistentRead defaults to False
)

βœ… Right: Use strongly consistent reads for critical data

## Strongly consistent read (always returns latest data)
response = table.get_item(
    Key={'user_id': 'user123'},
    ConsistentRead=True  # βœ… Guarantees latest data
)

5. Not Encrypting Sensitive Data at Rest

❌ Wrong: Storing sensitive data without encryption

s3.put_object(
    Bucket='user-data',
    Key='ssn-records.csv',
    Body=sensitive_data
    # ❌ No encryption specified!
)

βœ… Right: Always encrypt sensitive data

s3.put_object(
    Bucket='user-data',
    Key='ssn-records.csv',
    Body=sensitive_data,
    ServerSideEncryption='aws:kms',  # βœ… KMS encryption
    SSEKMSKeyId='arn:aws:kms:us-east-1:123456789012:key/abc123'
)

6. Choosing Wrong RDS Instance Size

🧠 Memory Device: RAMI - Read patterns, Availability needs, Memory requirements, IOPS demands

❌ Wrong: Undersizing database instance

  • High CPU utilization (>80% sustained)
  • Memory swapping
  • Connection pool exhaustion

βœ… Right: Monitor CloudWatch metrics and scale appropriately

cloudwatch = boto3.client('cloudwatch')

## Check CPU utilization
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/RDS',
    MetricName='CPUUtilization',
    Dimensions=[{'Name': 'DBInstanceIdentifier', 'Value': 'mydb'}],
    StartTime=datetime.utcnow() - timedelta(hours=1),
    EndTime=datetime.utcnow(),
    Period=300,
    Statistics=['Average']
)

avg_cpu = sum(point['Average'] for point in response['Datapoints']) / len(response['Datapoints'])
if avg_cpu > 80:
    print("⚠️ Consider scaling up RDS instance!")

7. Not Using S3 Transfer Acceleration for Large Files

❌ Wrong: Uploading large files directly to S3 from distant regions

## Slow upload from Asia to us-east-1
s3.upload_file('large_video.mp4', 'my-bucket', 'video.mp4')

βœ… Right: Enable Transfer Acceleration for faster uploads

## Enable Transfer Acceleration on bucket
s3.put_bucket_accelerate_configuration(
    Bucket='my-bucket',
    AccelerateConfiguration={'Status': 'Enabled'}
)

## Use accelerated endpoint
s3_accelerated = boto3.client(
    's3',
    config=boto3.session.Config(
        s3={'use_accelerate_endpoint': True}
    )
)

s3_accelerated.upload_file('large_video.mp4', 'my-bucket', 'video.mp4')
## βœ… Up to 50-500% faster from distant locations

Key Takeaways 🎯

πŸ“‹ AWS Storage & Data Services Quick Reference

ServiceTypeBest ForKey Feature
S3Object StorageStatic content, backups, data lakes11 nines durability
EBSBlock StorageEC2 boot/data volumesSnapshots, encryption
EFSFile StorageShared file systemsMulti-instance access
RDSRelational DBStructured data, transactionsAutomated backups, Multi-AZ
AuroraCloud-Native DBHigh-performance SQL5x MySQL performance
DynamoDBNoSQL DBKey-value, millisecond latencyGlobal Tables
RedshiftData WarehouseAnalytics, BIColumnar storage, MPP
ElastiCacheIn-Memory CacheSession storage, cachingMicrosecond latency
Snow FamilyPhysical TransferLarge-scale migrationsOffline data transfer
DataSyncOnline TransferAutomated sync10x faster than open-source

πŸŽ“ Study Tips:

  • S3 lifecycle transitions: Standard β†’ Standard-IA (30d) β†’ Glacier (90d) β†’ Deep Archive (180d)
  • EBS vs EFS: EBS = one instance, EFS = many instances
  • RDS Multi-AZ: Synchronous replication for high availability
  • RDS Read Replicas: Asynchronous replication for read scaling
  • DynamoDB consistency: Eventually consistent (default) vs Strongly consistent
  • Choose Snowball when: Data > 10TB or network transfer time > shipping time

πŸ”§ Try This: Hands-On Practice

Challenge 1: Create an S3 bucket with lifecycle policies

aws s3 mb s3://my-practice-bucket-$(date +%s)
aws s3api put-bucket-lifecycle-configuration \
    --bucket my-practice-bucket-* \
    --lifecycle-configuration file://lifecycle.json

Challenge 2: Launch an RDS instance with read replica

## Create primary
aws rds create-db-instance \
    --db-instance-identifier practice-db-primary \
    --db-instance-class db.t3.micro \
    --engine mysql \
    --master-username admin \
    --master-user-password TempPass123!

## Create read replica (after primary is available)
aws rds create-db-instance-read-replica \
    --db-instance-identifier practice-db-replica \
    --source-db-instance-identifier practice-db-primary

Challenge 3: Set up DynamoDB table with auto-scaling

import boto3

dynamodb = boto3.client('dynamodb')
application_autoscaling = boto3.client('application-autoscaling')

## Create table
table = dynamodb.create_table(
    TableName='practice-table',
    KeySchema=[{'AttributeName': 'id', 'KeyType': 'HASH'}],
    AttributeDefinitions=[{'AttributeName': 'id', 'AttributeType': 'S'}],
    BillingMode='PROVISIONED',
    ProvisionedThroughput={'ReadCapacityUnits': 5, 'WriteCapacityUnits': 5}
)

## Configure auto-scaling
application_autoscaling.register_scalable_target(
    ServiceNamespace='dynamodb',
    ResourceId='table/practice-table',
    ScalableDimension='dynamodb:table:WriteCapacityUnits',
    MinCapacity=5,
    MaxCapacity=100
)

πŸ“š Further Study


πŸŽ‰ Congratulations! You now understand AWS storage and data services. Practice with free flashcards above, and experiment with the AWS Free Tier to solidify your knowledge. Remember: choosing the right storage service depends on your access patterns, durability requirements, and cost constraints!