AWS Service Decision Guides

When to use this guide: This is reference material designed for Phases 3-5 of the bootcamp (Weeks 4-8). If you're in Weeks 1-3, bookmark this page and return when you begin designing architectures. You won't fully appreciate these comparisons until you have hands-on experience with the underlying services.

How to use this guide: When you need to choose between similar AWS services (e.g., Lambda vs ECS, RDS vs DynamoDB), find the relevant decision table below. Start with the workload requirements, not the service. Read the "Choose X when..." recommendations at the bottom of each table. If you're still unsure, work through the scenario examples.

Compute: Lambda vs. ECS vs. EC2

Factor	Lambda	ECS (Fargate)	EC2
Best for	Event-driven, short tasks (<15 min)	Long-running services, microservices	Full OS control, GPU, legacy apps
Scaling	Automatic per-invocation	Task-level auto scaling	Auto Scaling groups
Pricing	Per request + duration	Per vCPU/memory per second	Per instance-hour
Cold starts	Yes (mitigated with provisioned concurrency)	Task startup ~30-60s	Instance launch ~1-3 min
Max runtime	15 minutes	No limit	No limit
State	Stateless	Stateless or stateful with EFS	Stateful
Ops overhead	Minimal	Low	High

Choose Lambda when you have event-driven workloads, APIs with variable traffic, or data processing triggered by S3/SQS/DynamoDB events.

Choose ECS (Fargate) when you need long-running services, consistent baseline traffic, or container-based microservices without managing servers.

Choose EC2 when you need full OS access, GPU instances, specific kernel configurations, or are running legacy applications that cannot be containerized.

Scenario examples:

Scenario	Recommended	Why
REST API with 10 requests/minute during business hours, near-zero at night	Lambda + API Gateway	Variable traffic, pay-per-request, no idle cost
Web app serving 1,000 concurrent users 24/7	ECS (Fargate)	Consistent baseline traffic, long-running, no server management
Machine learning model training with GPU	EC2 (P-series instances)	GPU required, long-running, full OS control
Thumbnail generation triggered by S3 uploads	Lambda	Event-driven, short execution, automatic scaling
Legacy Java app that requires specific JVM tuning	EC2	Full OS control, custom JVM configuration

Database: RDS vs. DynamoDB vs. ElastiCache

Factor	RDS	DynamoDB	ElastiCache
Data model	Relational (SQL)	Key-value / document (NoSQL)	Key-value (in-memory)
Best for	Complex queries, joins, transactions	High-throughput, predictable access patterns	Caching, session storage, leaderboards
Scaling	Vertical (instance size) + read replicas	Horizontal (automatic partitioning)	Cluster mode with sharding
Latency	Single-digit milliseconds	Single-digit milliseconds	Sub-millisecond
Pricing	Per instance-hour + storage	Per request + storage (on-demand) or provisioned capacity	Per node-hour
Multi-AZ	Multi-AZ standby (automatic failover)	Built-in (3 AZs by default)	Multi-AZ with automatic failover
Schema	Fixed schema, migrations required	Schema-less (flexible attributes)	No schema (key-value)

Choose RDS when you need complex queries with joins, ACID transactions, or your team has strong SQL expertise.

Choose DynamoDB when you have well-defined access patterns, need single-digit millisecond latency at any scale, or want zero operational overhead.

Choose ElastiCache when you need sub-millisecond reads for frequently accessed data, session management, or real-time leaderboards. Use it alongside RDS or DynamoDB, not as a replacement.

Scenario examples:

Scenario	Recommended	Why
E-commerce order history with joins across customers, products, and payments	RDS (PostgreSQL or MySQL)	Complex relational queries, ACID transactions
Product catalog browsed by millions of users with simple key lookups	DynamoDB	High throughput, predictable access pattern, single-digit ms latency
Session storage for a web app with 50,000 concurrent users	ElastiCache (Redis)	Sub-millisecond reads, automatic expiration (TTL)
IoT sensor data with 100,000 writes/second	DynamoDB	Horizontal scaling, high write throughput
Financial reporting with ad-hoc SQL queries across multiple tables	RDS (PostgreSQL)	Complex joins, aggregations, ad-hoc queries

Storage: S3 vs. EBS vs. EFS

Factor	S3	EBS	EFS
Type	Object storage	Block storage	File storage (NFS)
Access	HTTP/HTTPS API	Attached to one EC2 instance (or multi-attach io2)	Shared across multiple EC2/ECS/Lambda
Durability	99.999999999% (11 nines)	99.999%	99.999999999% (11 nines)
Scaling	Unlimited objects	Up to 64 TiB per volume	Automatic (petabyte scale)
Latency	~100ms (first byte)	Sub-millisecond	Low single-digit milliseconds
Cost	Lowest (per GB stored)	Medium (per GB provisioned)	Highest (per GB used)
Use case	Static assets, backups, data lakes	Boot volumes, databases, high-IOPS apps	Shared config, CMS, container storage

Choose S3 for static website hosting, backups, data lakes, and any data accessed via HTTP. Use lifecycle policies to move infrequently accessed data to cheaper storage classes.

Choose EBS for EC2 boot volumes, databases (RDS uses EBS under the hood), and any workload requiring low-latency block storage attached to a single instance.

Choose EFS when multiple compute resources (EC2 instances, ECS tasks, Lambda functions) need shared file access.

Scenario examples:

Scenario	Recommended	Why
Static website assets (images, CSS, JS) served via CloudFront	S3	Object storage, HTTP access, lowest cost, CDN-friendly
PostgreSQL database storage	EBS (gp3 or io2)	Block storage, sub-millisecond latency, attached to RDS instance
Shared configuration files across 10 ECS containers	EFS	NFS mount shared across multiple tasks
Data lake with 50 TB of Parquet files queried by Athena	S3	Unlimited scale, lowest cost, native Athena integration
Machine learning training data accessed by multiple EC2 instances	EFS	Shared access, automatic scaling

Load Balancer: ALB vs. NLB vs. CLB

Factor	ALB	NLB	CLB (Legacy)
Layer	Layer 7 (HTTP/HTTPS)	Layer 4 (TCP/UDP/TLS)	Layer 4 + basic Layer 7
Best for	Web apps, microservices, APIs	High throughput, low latency, non-HTTP	Legacy applications only
Routing	Path-based, host-based, header-based	Port-based	Basic round-robin
WebSocket	Yes	Yes	No
Static IP	No (use Global Accelerator)	Yes (Elastic IP per AZ)	No
Latency	~ms	~100μs	~ms
Cost	Per hour + LCU	Per hour + NLCU	Per hour + data

Choose ALB for HTTP/HTTPS workloads, REST APIs, microservices with path-based routing, or any application that needs Layer 7 features.

Choose NLB for TCP/UDP workloads, extreme performance requirements, static IPs, or when you need to preserve the client source IP.

Avoid CLB for new architectures. It exists for backward compatibility only.

Scenario examples:

Scenario	Recommended	Why
REST API with /api/* routed to backend, /web/* to frontend	ALB	Path-based routing, Layer 7, HTTP-aware
Real-time gaming server handling millions of UDP packets/second	NLB	Layer 4, ultra-low latency, UDP support
gRPC microservices with HTTP/2	ALB	HTTP/2 and gRPC support, target group routing
VPN endpoint requiring a static IP address	NLB	Static Elastic IP per AZ

Factor	SQS	SNS	EventBridge
Pattern	Queue (pull)	Pub/sub (push)	Event bus (push, rule-based)
Delivery	One consumer per message (standard)	Fan-out to many subscribers	Rule-based routing to targets
Ordering	FIFO queues guarantee order	FIFO topics guarantee order	Best-effort (or ordered per rule)
Retry	Built-in (visibility timeout + DLQ)	Retry policies per subscription	Built-in retry with DLQ
Use case	Decouple producer/consumer, buffer spikes	Notifications, fan-out to multiple queues	Cross-service event routing, SaaS integration

Choose SQS when you need to decouple a producer from a consumer, buffer traffic spikes, or guarantee at-least-once processing with retries.

Choose SNS when you need to fan out a single event to multiple subscribers (email, SQS queues, Lambda functions, HTTP endpoints).

Choose EventBridge when you need content-based routing (filter events by fields), cross-account event delivery, or integration with SaaS providers.

Common pattern: SNS → SQS fan-out. Publish to an SNS topic, subscribe multiple SQS queues. Each queue processes the event independently.

Scenario examples:

Scenario	Recommended	Why
Order processing: decouple the web tier from the payment processor	SQS	Queue buffers requests, retries on failure, DLQ for poison messages
New user signup triggers email, analytics, and provisioning	SNS → SQS fan-out	One event fans out to three independent consumers
Route S3 upload events to different Lambda functions based on file type	EventBridge	Content-based filtering on event fields (e.g., file extension)
Process exactly-once financial transactions in strict order	SQS FIFO	Exactly-once processing, guaranteed ordering by message group

IaC: CloudFormation vs. SAM vs. CDK vs. Terraform

Factor	CloudFormation	SAM	CDK	Terraform
Language	YAML / JSON	YAML (shorthand)	TypeScript, Python, Java, etc.	HCL
Best for	Any AWS resource	Serverless apps (Lambda, API GW, DynamoDB)	Complex infra with logic (loops, conditions)	Multi-cloud or team already using Terraform
Learning curve	Medium	Low (if you know CloudFormation)	Medium (requires programming)	Medium
State management	Managed by AWS (stacks)	Managed by AWS (stacks)	Managed by AWS (stacks)	State file (S3 + DynamoDB for locking)
Drift detection	Yes (via CloudFormation)	Yes (via CloudFormation)	Yes (via CloudFormation)	Yes (terraform plan)
AWS integration	Native	Native	Native (compiles to CloudFormation)	Provider-based

Choose CloudFormation when you need direct, declarative YAML/JSON templates and your team prefers configuration over code.

Choose SAM when building serverless applications. SAM shorthand reduces boilerplate for Lambda, API Gateway, and DynamoDB resources.

Choose CDK when you need programming constructs (loops, conditionals, abstractions) or your team prefers writing infrastructure in a familiar language.

Choose Terraform when you manage resources across multiple cloud providers or your organization has standardized on Terraform.

Security: Security Groups vs. NACLs

Factor	Security Groups	NACLs
Level	Instance / ENI	Subnet
State	Stateful (return traffic auto-allowed)	Stateless (must allow both inbound and outbound)
Rules	Allow only	Allow and deny
Evaluation	All rules evaluated together	Rules evaluated in order (lowest number first)
Default	Deny all inbound, allow all outbound	Allow all inbound and outbound
Use case	Primary firewall for instances	Subnet-level guardrails, block specific IPs

Use security groups as your primary firewall. They are stateful, easier to manage, and sufficient for most use cases.

Add NACLs as a secondary defense layer when you need to explicitly deny traffic from specific IP ranges or add subnet-level controls.

DR Strategy: Backup & Restore vs. Pilot Light vs. Warm Standby vs. Active-Active

Strategy	RTO	RPO	Cost	Complexity	Key AWS Services
Backup & Restore	Hours	Hours	Lowest	Low	AWS Backup, S3 Cross-Region Replication
Pilot Light	10s of minutes	Minutes	Low	Medium	RDS cross-Region read replica, Route 53, AMIs
Warm Standby	Minutes	Seconds–minutes	Medium	Medium-high	Auto Scaling (scaled down), RDS replica, Route 53 failover
Active-Active	Near-zero	Near-zero	Highest	High	DynamoDB Global Tables, Route 53 multivalue/latency, Aurora Global

Choose Backup & Restore for non-critical workloads where hours of downtime are acceptable.

Choose Pilot Light when you need faster recovery than backup/restore but want to minimize cost. Core infrastructure runs at minimum capacity and scales up during failover.

Choose Warm Standby for business-critical workloads that need recovery in minutes. A scaled-down copy of the production environment runs continuously.

Choose Active-Active for mission-critical workloads with near-zero tolerance for downtime. Traffic is served from multiple regions simultaneously.

Scenario examples:

Scenario	Recommended	Why
Internal wiki used by 50 employees	Backup & Restore	Low criticality, hours of downtime acceptable, lowest cost
E-commerce site with $10K/hour revenue	Warm Standby	Minutes of downtime acceptable, cost-justified by revenue impact
Payment processing system for a bank	Active-Active	Near-zero downtime required, regulatory requirements
Development/staging environment	Backup & Restore	Non-production, rebuild from IaC templates if needed

Quick Reference: Which Service for Which Job?

I need to...	Use this
Run code in response to an event	Lambda
Run a containerized web service 24/7	ECS (Fargate)
Run a VM with full OS control	EC2
Store and query relational data	RDS
Store key-value data at massive scale	DynamoDB
Cache frequently accessed data	ElastiCache
Store files, images, backups	S3
Attach a disk to an EC2 instance	EBS
Share files across multiple instances	EFS
Route HTTP traffic to microservices	ALB
Route TCP/UDP traffic with static IPs	NLB
Decouple two services with a queue	SQS
Send one event to many consumers	SNS
Route events based on content	EventBridge
Define infrastructure as YAML	CloudFormation
Define serverless infrastructure	SAM
Define infrastructure as code (TypeScript/Python)	CDK
Encrypt data at rest	KMS
Store secrets securely	Secrets Manager
Monitor metrics and set alarms	CloudWatch
Trace requests across services	X-Ray
Automate deployments	CodePipeline
Serve static content globally	CloudFront
Register a domain and route DNS	Route 53