Why This Phase Exists
Data has gravity. Once you store a terabyte of objects in a particular service, region, or access pattern, everything else in your architecture begins to orbit around it. Compute moves closer to the data. Applications shape themselves around the APIs you chose for reads and writes. Compliance requirements lock you into retention policies that outlast the engineers who designed the system.
This is why storage decisions are among the most consequential an architect makes. Choose wrong and you pay the tax forever: migrating petabytes between storage classes costs real money and real engineering hours. Choose right and your storage layer becomes invisible, scaling from megabytes to exabytes without architectural changes, delivering the exact durability and latency your workloads demand at the lowest possible cost.
AWS offers purpose-built storage services because no single system optimizes for all access patterns simultaneously. Object storage excels at scale and durability but cannot serve as a POSIX filesystem. Block storage delivers sub-millisecond latency but attaches to a single instance. File storage shares across thousands of clients but costs more per gigabyte. Archival storage drops your costs by 95% but introduces retrieval delays measured in hours.
Your job as a Solutions Architect is to match each data type to the storage service that satisfies its access frequency, durability requirement, compliance constraint, and cost budget. This phase gives you the depth to make those decisions with confidence.
What You Will Master
By the end of Phase 4, you will be able to:
- Design S3 architectures that handle billions of objects with strong read-after-write consistency
- Implement defense-in-depth security for storage using bucket policies, encryption, Block Public Access, and presigned URLs
- Optimize storage costs by 60-80% using intelligent lifecycle policies that transition objects through storage classes automatically
- Architect compliant archival solutions using Glacier Vault Lock with WORM (Write Once Read Many) policies for regulatory retention
- Deploy shared filesystems with EFS that scale to petabytes and serve thousands of concurrent NFS clients
- Bridge on-premises storage to AWS using Storage Gateway and Transfer Family for hybrid architectures
- Make the critical S3 vs EBS vs EFS vs Glacier decision correctly on the first attempt for any given workload
Modules in This Phase
| Module | Title | Key Focus Areas |
|---|---|---|
| 21 | S3 Fundamentals | Buckets, objects, keys, operations, strong consistency model, request pricing |
| 22 | S3 Security & Access Control | Bucket policies, ACLs, Block Public Access, presigned URLs, SSE-S3/SSE-KMS/SSE-C encryption |
| 23 | S3 Advanced Features | Versioning, lifecycle policies, replication (CRR/SRR), storage classes, event notifications, Transfer Acceleration |
| 24 | S3 Glacier & Archival | Glacier Instant Retrieval, Flexible Retrieval, Deep Archive, Vault Lock, retrieval tiers |
| 25 | Elastic File System (EFS) | Shared NFS, General Purpose vs Max I/O, Bursting vs Provisioned throughput, lifecycle management, access points |
| 26 | Storage Gateway & Transfer Family | File Gateway, Volume Gateway, Tape Gateway, Transfer Family for SFTP/FTPS/FTP |
The Progressive Path
This phase follows a deliberate progression. Modules 21 through 24 form a complete S3 mastery arc, moving from fundamentals through security, advanced features, and archival. You cannot configure lifecycle policies intelligently (Module 23) without understanding the object model (Module 21) and encryption requirements (Module 22). You cannot architect Glacier solutions (Module 24) without understanding the storage classes and transitions from Module 23.
Module 25 introduces EFS as the answer to a fundamentally different question: what happens when multiple compute instances need to read and write the same files simultaneously? This is not a problem S3 solves. S3 is object storage with an HTTP API. EFS is a POSIX-compliant filesystem that mounts like any NFS share.
Module 26 addresses the hybrid reality. Most enterprises cannot migrate all storage to AWS on day one. Storage Gateway and Transfer Family provide the bridge, exposing cloud storage through protocols that on-premises applications already speak: NFS, SMB, iSCSI, SFTP.
Services You Will Command
Amazon S3
Simple Storage Service is the gravitational center of AWS storage. It provides 11 nines (99.999999999%) of durability, meaning if you store 10 million objects, you can statistically expect to lose one every 10,000 years. S3 stores objects (files up to 5 TB each) in buckets (containers with globally unique names) and exposes them through a simple HTTP API of PUT, GET, DELETE, and LIST operations.
What makes S3 architecturally significant is its strong read-after-write consistency model. As of December 2020, every successful write to S3 is immediately visible to all subsequent reads. No more eventual consistency edge cases. No more worrying about stale reads after overwrites. This consistency guarantee applies to PUTs of new objects, PUTs that overwrite existing objects, and DELETEs.
You will learn to think in S3 key prefixes (not folders), understand request rate performance (3,500 PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix), and design bucket structures that scale without hitting partition limits.
S3 Glacier
Glacier is not a separate service. It is a set of S3 storage classes purpose-built for data archival. The three tiers offer different tradeoffs between cost and retrieval speed:
- Glacier Instant Retrieval: Same millisecond access as S3 Standard but 68% lower cost. Ideal for data accessed once per quarter.
- Glacier Flexible Retrieval: Retrieval in minutes to hours. Costs roughly 90% less than Standard. Ideal for backups and disaster recovery datasets.
- Glacier Deep Archive: The lowest cost storage in AWS. Retrieval takes 12 to 48 hours. Designed for data you must retain for 7 to 10 years but rarely (if ever) access.
Vault Lock enables compliance controls with WORM policies. Once locked, even the root account cannot delete or modify objects until the retention period expires. This satisfies SEC Rule 17a-4, HIPAA, and FINRA requirements.
Amazon EFS
Elastic File System provides a fully managed NFS filesystem that grows and shrinks automatically as you add and remove files. Unlike EBS volumes (which attach to a single EC2 instance), EFS can serve thousands of concurrent connections across multiple Availability Zones simultaneously.
Two performance modes define how EFS handles I/O. General Purpose mode delivers the lowest per-operation latency and works for the vast majority of workloads. Max I/O mode scales to higher aggregate throughput but introduces slightly higher latencies per operation. Two throughput modes control bandwidth: Bursting mode (throughput scales with filesystem size) and Provisioned mode (you specify throughput independent of storage).
EFS access points simplify managing application access to shared datasets by enforcing a user identity, root directory, and permissions for each application connecting to the filesystem.
AWS Storage Gateway
Storage Gateway runs as a virtual appliance in your on-premises environment (or as an EC2 instance) and bridges local applications to AWS cloud storage. Three gateway types serve different use cases:
- File Gateway: Presents S3 buckets as NFS or SMB file shares. On-premises applications read and write files normally while data is stored durably in S3. Local caching ensures low-latency access for frequently used files.
- Volume Gateway: Presents cloud-backed iSCSI block storage. Stored Volumes keep full copies locally with asynchronous backups to S3. Cached Volumes keep frequently accessed data locally while the full dataset lives in S3.
- Tape Gateway: Presents a virtual tape library (VTL) to existing backup software. Tapes are archived to S3 Glacier, replacing physical tape infrastructure with cloud durability.
AWS Transfer Family
Transfer Family provides fully managed file transfer into and out of S3 or EFS using SFTP, FTPS, FTP, and AS2 protocols. This matters because thousands of business workflows, particularly in financial services and healthcare, depend on SFTP for partner file exchanges. Transfer Family lets you maintain those workflows unchanged while the underlying storage moves to S3.
You configure identity providers (service-managed, Active Directory, or custom Lambda-backed authentication), map users to S3 prefixes or EFS paths, and retain full CloudWatch logging of all transfer activity for audit trails.
The Storage Decision Framework
This is the decision that separates competent architects from exceptional ones. When a workload needs persistent storage, you must choose correctly:
Use Amazon S3 when:
- You need virtually unlimited scale (no capacity planning)
- Access is through HTTP APIs, not filesystem calls
- Data is written once and read many times (or accessed infrequently)
- You need cross-region replication, lifecycle automation, or event-driven processing
- Multiple services or accounts need access to the same data
Use Amazon EBS when:
- A single EC2 instance needs sub-millisecond block-level I/O
- You are running databases, boot volumes, or transactional workloads
- You need consistent IOPS performance (Provisioned IOPS SSD)
- Data must persist independently of instance lifecycle but serves one instance at a time
Use Amazon EFS when:
- Multiple EC2 instances, containers, or Lambda functions must share the same filesystem
- Applications require POSIX-compliant file operations (locks, permissions, directory hierarchy)
- The dataset grows unpredictably and you refuse to manage capacity
- You need cross-AZ access for high availability
Use S3 Glacier when:
- Data must be retained for compliance but is rarely or never accessed
- You can tolerate retrieval delays (seconds to hours depending on tier)
- Cost optimization is the primary driver and access patterns are infrequent
- You need WORM compliance with Vault Lock
The wrong choice here does not break your application on day one. It breaks your budget on day ninety and your architecture on day three hundred. Storage migrations are among the most expensive refactoring exercises in cloud engineering because data volume makes them slow, risky, and disruptive to production workloads.
Architecture Context
Phase 4 builds directly on the networking and compute foundations from earlier phases. Your S3 buckets will serve static assets behind the CloudFront distributions you configured previously. VPC endpoints (Gateway endpoints for S3, Interface endpoints for EFS) ensure storage traffic never traverses the public internet. IAM policies from Phase 2 control which principals can access which buckets and objects.
Looking ahead, the storage patterns you learn here become critical infrastructure for later phases. Database backups (Phase 5) land in S3 with lifecycle policies transitioning them to Glacier. Container images (Phase 7) store layers in S3-backed registries. CI/CD pipelines (Phase 8) use S3 for artifact storage between pipeline stages. Monitoring solutions (Phase 9) archive logs to S3 Intelligent-Tiering for cost-effective retention.
Every architecture you build going forward will have a storage layer. The decisions you learn to make in this phase determine whether that layer is an asset that scales gracefully or a liability that constrains every system built on top of it.
Phase Exam
After completing all six modules, you will take the Phase 4 Storage exam:
- 30 multiple-choice questions covering all services and architectural decisions from this phase
- 50 minutes time limit
- 70% pass threshold (21/30 correct)
- Questions emphasize storage selection decisions, security configurations, lifecycle policy design, and cost optimization strategies
- Expect scenario-based questions that present a workload and ask you to select the correct storage service, class, or configuration
- Retrieval tier selection, encryption mode decisions, and replication architecture questions are heavily represented