AWS Service Decision Guides
Use these decision frameworks when designing architectures for labs, the capstone project, or real-world workloads. Each guide consolidates the selection criteria taught across multiple modules into a single reference.
Compute: Lambda vs. ECS vs. EC2
| Factor | Lambda | ECS (Fargate) | EC2 |
|---|---|---|---|
| Best for | Event-driven, short tasks (<15 min) | Long-running services, microservices | Full OS control, GPU, legacy apps |
| Scaling | Automatic per-invocation | Task-level auto scaling | Auto Scaling groups |
| Pricing | Per request + duration | Per vCPU/memory per second | Per instance-hour |
| Cold starts | Yes (mitigated with provisioned concurrency) | Task startup ~30-60s | Instance launch ~1-3 min |
| Max runtime | 15 minutes | No limit | No limit |
| State | Stateless | Stateless or stateful with EFS | Stateful |
| Ops overhead | Minimal | Low | High |
Choose Lambda when you have event-driven workloads, APIs with variable traffic, or data processing triggered by S3/SQS/DynamoDB events.
Choose ECS (Fargate) when you need long-running services, consistent baseline traffic, or container-based microservices without managing servers.
Choose EC2 when you need full OS access, GPU instances, specific kernel configurations, or are running legacy applications that cannot be containerized.
Database: RDS vs. DynamoDB vs. ElastiCache
| Factor | RDS | DynamoDB | ElastiCache |
|---|---|---|---|
| Data model | Relational (SQL) | Key-value / document (NoSQL) | Key-value (in-memory) |
| Best for | Complex queries, joins, transactions | High-throughput, predictable access patterns | Caching, session storage, leaderboards |
| Scaling | Vertical (instance size) + read replicas | Horizontal (automatic partitioning) | Cluster mode with sharding |
| Latency | Single-digit milliseconds | Single-digit milliseconds | Sub-millisecond |
| Pricing | Per instance-hour + storage | Per request + storage (on-demand) or provisioned capacity | Per node-hour |
| Multi-AZ | Multi-AZ standby (automatic failover) | Built-in (3 AZs by default) | Multi-AZ with automatic failover |
| Schema | Fixed schema, migrations required | Schema-less (flexible attributes) | No schema (key-value) |
Choose RDS when you need complex queries with joins, ACID transactions, or your team has strong SQL expertise.
Choose DynamoDB when you have well-defined access patterns, need single-digit millisecond latency at any scale, or want zero operational overhead.
Choose ElastiCache when you need sub-millisecond reads for frequently accessed data, session management, or real-time leaderboards. Use it alongside RDS or DynamoDB, not as a replacement.
Storage: S3 vs. EBS vs. EFS
| Factor | S3 | EBS | EFS |
|---|---|---|---|
| Type | Object storage | Block storage | File storage (NFS) |
| Access | HTTP/HTTPS API | Attached to one EC2 instance (or multi-attach io2) | Shared across multiple EC2/ECS/Lambda |
| Durability | 99.999999999% (11 nines) | 99.999% | 99.999999999% (11 nines) |
| Scaling | Unlimited objects | Up to 64 TiB per volume | Automatic (petabyte scale) |
| Latency | ~100ms (first byte) | Sub-millisecond | Low single-digit milliseconds |
| Cost | Lowest (per GB stored) | Medium (per GB provisioned) | Highest (per GB used) |
| Use case | Static assets, backups, data lakes | Boot volumes, databases, high-IOPS apps | Shared config, CMS, container storage |
Choose S3 for static website hosting, backups, data lakes, and any data accessed via HTTP. Use lifecycle policies to move infrequently accessed data to cheaper storage classes.
Choose EBS for EC2 boot volumes, databases (RDS uses EBS under the hood), and any workload requiring low-latency block storage attached to a single instance.
Choose EFS when multiple compute resources (EC2 instances, ECS tasks, Lambda functions) need shared file access.
Load Balancer: ALB vs. NLB vs. CLB
| Factor | ALB | NLB | CLB (Legacy) |
|---|---|---|---|
| Layer | Layer 7 (HTTP/HTTPS) | Layer 4 (TCP/UDP/TLS) | Layer 4 + basic Layer 7 |
| Best for | Web apps, microservices, APIs | High throughput, low latency, non-HTTP | Legacy applications only |
| Routing | Path-based, host-based, header-based | Port-based | Basic round-robin |
| WebSocket | Yes | Yes | No |
| Static IP | No (use Global Accelerator) | Yes (Elastic IP per AZ) | No |
| Latency | ~ms | ~100μs | ~ms |
| Cost | Per hour + LCU | Per hour + NLCU | Per hour + data |
Choose ALB for HTTP/HTTPS workloads, REST APIs, microservices with path-based routing, or any application that needs Layer 7 features.
Choose NLB for TCP/UDP workloads, extreme performance requirements, static IPs, or when you need to preserve the client source IP.
Avoid CLB for new architectures. It exists for backward compatibility only.
Messaging: SQS vs. SNS vs. EventBridge
| Factor | SQS | SNS | EventBridge |
|---|---|---|---|
| Pattern | Queue (pull) | Pub/sub (push) | Event bus (push, rule-based) |
| Delivery | One consumer per message (standard) | Fan-out to many subscribers | Rule-based routing to targets |
| Ordering | FIFO queues guarantee order | FIFO topics guarantee order | Best-effort (or ordered per rule) |
| Retry | Built-in (visibility timeout + DLQ) | Retry policies per subscription | Built-in retry with DLQ |
| Use case | Decouple producer/consumer, buffer spikes | Notifications, fan-out to multiple queues | Cross-service event routing, SaaS integration |
Choose SQS when you need to decouple a producer from a consumer, buffer traffic spikes, or guarantee at-least-once processing with retries.
Choose SNS when you need to fan out a single event to multiple subscribers (email, SQS queues, Lambda functions, HTTP endpoints).
Choose EventBridge when you need content-based routing (filter events by fields), cross-account event delivery, or integration with SaaS providers.
Common pattern: SNS → SQS fan-out. Publish to an SNS topic, subscribe multiple SQS queues. Each queue processes the event independently.
Security: Security Groups vs. NACLs
| Factor | Security Groups | NACLs |
|---|---|---|
| Level | Instance / ENI | Subnet |
| State | Stateful (return traffic auto-allowed) | Stateless (must allow both inbound and outbound) |
| Rules | Allow only | Allow and deny |
| Evaluation | All rules evaluated together | Rules evaluated in order (lowest number first) |
| Default | Deny all inbound, allow all outbound | Allow all inbound and outbound |
| Use case | Primary firewall for instances | Subnet-level guardrails, block specific IPs |
Use security groups as your primary firewall. They are stateful, easier to manage, and sufficient for most use cases.
Add NACLs as a secondary defense layer when you need to explicitly deny traffic from specific IP ranges or add subnet-level controls.
DR Strategy: Backup & Restore vs. Pilot Light vs. Warm Standby vs. Active-Active
| Strategy | RTO | RPO | Cost | Complexity |
|---|---|---|---|---|
| Backup & Restore | Hours | Hours | Lowest | Low |
| Pilot Light | 10s of minutes | Minutes | Low | Medium |
| Warm Standby | Minutes | Seconds–minutes | Medium | Medium-high |
| Active-Active | Near-zero | Near-zero | Highest | High |
Choose Backup & Restore for non-critical workloads where hours of downtime are acceptable.
Choose Pilot Light when you need faster recovery than backup/restore but want to minimize cost. Core infrastructure runs at minimum capacity and scales up during failover.
Choose Warm Standby for business-critical workloads that need recovery in minutes. A scaled-down copy of the production environment runs continuously.
Choose Active-Active for mission-critical workloads with near-zero tolerance for downtime. Traffic is served from multiple regions simultaneously.
AWS Bootcamp: From Novice to Architect Author: Samuel Ogunti License: CC BY-NC 4.0