Learning Objectives

By the end of this module, you will be able to:

Explain what a database is and why applications need persistent data storage
Distinguish between relational databases (tables, rows, columns) and non-relational databases (key-value, document)
Write basic SQL queries to create, read, update, and delete data (CRUD operations)
Define primary keys, foreign keys, and indexes, and explain why they matter
Describe the difference between structured, semi-structured, and unstructured data
Explain what ACID properties mean and why transactions matter for data integrity
Identify common database use cases and match them to the appropriate database type

Prerequisites

Completion of Module 04: APIs and Programming Basics (understanding of data types and JSON)
No prior database experience required

Estimated self-study time:

Activity	Estimated Time
Reading	15 to 20 minutes
Quiz	5 to 10 minutes
Total	25 to 35 minutes

A database is an organized collection of data that can be easily accessed, managed, and updated. Almost every application you use (email, social media, banking, shopping) stores its data in a database.

Without a database, an application would lose all its data every time it restarts. Databases provide persistence (data survives restarts), structure (data is organized), and concurrent access (multiple users can read and write simultaneously without corrupting data).

Files vs. Databases

You might wonder: why not just store data in files? For simple cases, files work fine. But as data grows, files become problematic:

Aspect	Files	Databases
Finding specific data	Scan entire file	Instant lookup via indexes
Multiple users writing	Risk of corruption	Handled safely with transactions
Relationships between data	Manual, error-prone	Built-in with foreign keys
Querying patterns	Write custom code	Standard query language (SQL)
Data integrity	No guarantees	Enforced constraints

✅Tip: Think of a database like a library with a catalog system. You can find any book instantly by title, author, or subject, without scanning every shelf. Files are like a pile of books on the floor.

Relational Databases (SQL)

A relational database organizes data into tables (also called relations). Each table has:

Columns (fields): define what data is stored (name, email, age)
Rows (records): individual entries in the table
Primary key: a unique identifier for each row (like a student ID)

Example: a students table

id	name	email	enrolled_date
1	Jane Doe	jane@example.com	2026-05-23
2	John Smith	john@example.com	2026-05-23
3	Maria Garcia	maria@example.com	2026-05-23

The id column is the primary key: every row has a unique value.

SQL: Structured Query Language

SQL (pronounced "sequel" or "S-Q-L") is the standard language for interacting with relational databases. The four fundamental operations are called CRUD:

Create: insert new data:

INSERT INTO students (name, email, enrolled_date)
VALUES ('Jane Doe', 'jane@example.com', '2026-05-23');

Read: retrieve data:

SELECT name, email FROM students WHERE enrolled_date = '2026-05-23';

Update: modify existing data:

UPDATE students SET email = 'jane.doe@example.com' WHERE id = 1;

Delete: remove data:

DELETE FROM students WHERE id = 3;

Relationships and Foreign Keys

Real applications have related data. An e-commerce app has customers, orders, and products. These are connected:

A customer places many orders
Each order contains many products

A foreign key links one table to another:

CREATE TABLE orders (
  id INTEGER PRIMARY KEY,
  customer_id INTEGER REFERENCES customers(id),
  total_amount DECIMAL,
  order_date DATE
);

The customer_id column references the customers table; this is a foreign key. It ensures every order belongs to a valid customer.

Indexes

An index is like a book's index: it helps the database find data quickly without scanning every row. Without an index on email, finding a student by email requires checking all rows (slow). With an index, it's nearly instant.

CREATE INDEX idx_students_email ON students(email);

Check your understanding: You have a table with 10 million rows. A query searches by email address. Without an index, the database scans all 10 million rows. With an index, it finds the result in microseconds. Why would you not put an index on every column? (Answer: indexes use storage space and slow down writes, because the index must be updated on every insert/update.)

Non-Relational Databases (NoSQL)

Not all data fits neatly into tables with fixed columns. Non-relational databases (often called NoSQL) handle flexible, high-volume, or hierarchical data.

Key-Value Stores

The simplest NoSQL type. Each item is a key (unique identifier) and a value (any data).

Key: "user:1234"
Value: {"name": "Jane", "email": "jane@example.com", "preferences": {...}}

Use cases: caching, session storage, user preferences. AWS example: Amazon DynamoDB, Amazon ElastiCache.

Document Databases

Store data as JSON-like documents. Each document can have different fields; no fixed schema required.

{
  "orderId": "ORD-9876",
  "customer": "Jane Doe",
  "items": [
    {"product": "AWS Certification Guide", "qty": 1, "price": 45.00},
    {"product": "Cloud Architecture Poster", "qty": 2, "price": 15.00}
  ],
  "status": "shipped"
}

Use cases: content management, catalogs, user profiles. AWS example: Amazon DocumentDB, DynamoDB.

When to Use Relational vs. Non-Relational

Factor	Relational (SQL)	Non-Relational (NoSQL)
Data structure	Fixed schema, relationships	Flexible schema, nested data
Query patterns	Complex joins, aggregations	Simple lookups by key
Scale	Vertical (bigger server)	Horizontal (more servers)
Consistency	Strong (ACID)	Eventual (in many cases)
Best for	Financial data, inventory, reporting	Real-time apps, IoT, gaming

✅Tip: In the AWS Bootcamp (Module 06), you'll use Amazon RDS (relational) and Amazon DynamoDB (non-relational). Understanding when to choose each is a key architect skill.

Atomicity: A transaction either completes entirely or not at all. If transferring money between accounts, both the debit and credit happen, or neither does.
Consistency: Data always moves from one valid state to another. Constraints (like "balance cannot be negative") are always enforced.
Isolation: Concurrent transactions don't interfere with each other. Two people buying the last item in stock won't both succeed.
Durability: Once a transaction is committed, it survives crashes. Your data is safe even if the power goes out.

Check your understanding: A banking app transfers $100 from Account A to Account B. The debit from A succeeds, but the system crashes before crediting B. With ACID, what happens? (Answer: the entire transaction rolls back; A gets its $100 back. Without ACID, the $100 would be lost.)

Key Takeaways

Databases provide persistent, organized, concurrent data storage for applications
Relational databases use tables with fixed schemas, SQL for queries, and foreign keys for relationships
Non-relational databases offer flexible schemas and horizontal scaling for high-volume workloads
SQL CRUD operations (INSERT, SELECT, UPDATE, DELETE) are the foundation of data manipulation
ACID properties guarantee data integrity in relational databases
In the AWS Bootcamp, you'll use RDS (relational) and DynamoDB (non-relational); this module gives you the conceptual foundation

AWS Bootcamp: From Novice to Architect Author: Samuel Ogunti License: CC BY-NC 4.0

Module 08: Databases and Data

Learning Objectives

Prerequisites

Concepts

What Is a Database?