When does a document database make more sense?

A document database is a strong fit when each entity naturally contains all its related data (a blog post with embedded comments), when the schema evolves frequently (product catalogues with heterogeneous attributes), when you need to scale horizontally across many nodes, or when the development team prefers working with JSON objects that map directly to application objects.

Can I do JOINs in a document database?

Most document databases do not support JOINs natively in the relational sense. MongoDB offers a $lookup aggregation stage, but it is far less powerful than SQL JOINs and generally discouraged for hot paths. The document model instead encourages embedding related data in the parent document (denormalisation) so that a single read retrieves everything you need.

Is it true that document databases scale better?

Document databases are designed for horizontal scaling (sharding data across many servers) from the start, which makes write-heavy, high-volume workloads easier to distribute. Relational databases can also scale horizontally (read replicas, partitioning, CockroachDB, Citus), but it requires more configuration. If you are already on a single powerful server, a relational database will likely outperform a document database on complex queries.

What is BASE and how does it relate to document databases?

BASE stands for Basically Available, Soft state, Eventually consistent. It describes the trade-offs many distributed NoSQL systems make: they sacrifice strict consistency for availability and partition tolerance (the CAP theorem). A document database cluster may temporarily return stale data on a secondary node before replication catches up. This is acceptable for many use cases (social feeds, caching layers) but unacceptable for financial ledgers where you need a precise, consistent balance at all times.

Database architecture comparison

Relational vs Document Database

SQL or NoSQL? Normalise or embed? Choosing between a relational and a document database is one of the earliest — and most consequential — decisions in any new project. This guide explains the architectural trade-offs in plain English, with the vocabulary and phrases you will hear in real engineering discussions.

TL;DR

Relational databases (PostgreSQL, MySQL, SQLite) store data in tables with a fixed schema. Relationships are expressed with foreign keys and queried with JOINs. Full ACID transactions are guaranteed.
Document databases (MongoDB, Firestore, CouchDB) store self-contained documents (JSON/BSON) inside collections. Schema is flexible. Related data is often embedded rather than joined. Distributed clusters follow BASE principles by default.
Neither is universally superior. Relational databases win on complex queries and data integrity; document databases win on schema flexibility and horizontal scaling for certain access patterns.

Side-by-side comparison

Aspect	Relational (SQL)	Document (NoSQL)
Data unit	Row in a table	Document in a collection
Schema	Fixed, enforced at write time (DDL)	Flexible — optional validation only
Relationships	Foreign keys + JOINs	Embedding or application-level references
Query language	SQL — declarative, standardised	DB-specific API (MQL, GQL, etc.)
Transactions	Full ACID across many tables	ACID on single document; multi-doc ACID available (opt-in)
Consistency model	Strong (ACID)	Eventual by default in distributed clusters (BASE)
Normalisation	Encouraged — reduce duplication via normal forms	Denormalisation common — duplicate data for read speed
Horizontal scaling	Possible but complex (sharding, partitioning)	Built-in — designed for sharded clusters from the start
Query flexibility	Very high — ad-hoc JOINs across any tables	Best when queries match the document shape
Typical use cases	Finance, ERP, e-commerce, reporting	Content management, user profiles, catalogues, real-time apps

What is a relational database?

A relational database organises data into tables (also called relations). Each table has named columns with defined data types, and every row must conform to that structure. Relationships between tables are expressed through foreign keys — a column whose value matches the primary key in another table.

The query language is SQL (Structured Query Language), which allows you to combine data from multiple tables using JOINs, filter with WHERE clauses, and aggregate with GROUP BY. Because the schema is known in advance, the database engine can optimise queries, enforce constraints (NOT NULL, UNIQUE, CHECK), and guarantee ACID properties across all writes — even those touching multiple tables in a single transaction.

Normalisation is the practice of splitting data across multiple tables to remove duplication and keep each piece of information in one place. A fully normalised schema has no redundancy but requires JOINs to reassemble related data at query time.

What is a document database?

A document database stores records as documents — typically JSON or BSON objects — inside named collections. Unlike a table row, a document can contain nested objects and arrays, so related data can live together in a single record rather than being split across tables.

This embedding strategy is the key architectural difference. Instead of a users table and a separate addresses table linked by a foreign key, a document database might store the address directly inside the user document. A single read retrieves everything you need with no JOIN required.

The trade-off is denormalisation: the same piece of data may appear in many documents. If a product name changes, you may need to update it in thousands of documents rather than once in a single row. Schema validation is optional, which speeds up development early on but requires discipline as the application grows.

ACID vs BASE — the consistency trade-off

ACID (Atomicity, Consistency, Isolation, Durability) describes the guarantees that relational databases provide. A financial transfer that debits one account and credits another either completes fully or not at all — there is no intermediate state where money is missing from both accounts.

BASE (Basically Available, Soft state, Eventually consistent) describes the approach taken by many distributed document databases. In a cluster spread across multiple data centres, a write to one node may not immediately be visible on another. Most reads will return the correct data most of the time, but there is a brief window where a secondary replica may serve stale data.

This is often acceptable for social feeds or product catalogues, but dangerous for financial ledgers where you need a precise, consistent balance at all times. Modern document databases have narrowed this gap — MongoDB has supported multi-document ACID transactions since version 4.0 — but the default configuration still favours availability and performance over strict consistency.

Schema design side-by-side

Storing a blog post with its author and comments:

Relational — normalised tables

{`-- Three separate tables, linked by foreign keys
CREATE TABLE users (
  id   SERIAL PRIMARY KEY,
  name TEXT NOT NULL
);

CREATE TABLE posts (
  id        SERIAL PRIMARY KEY,
  author_id INT REFERENCES users(id),
  title     TEXT,
  body      TEXT
);

CREATE TABLE comments (
  id      SERIAL PRIMARY KEY,
  post_id INT REFERENCES posts(id),
  text    TEXT
);

-- Reassemble with JOINs at query time
SELECT p.title, u.name, c.text
FROM posts p
JOIN users    u ON u.id     = p.author_id
JOIN comments c ON c.post_id = p.id
WHERE p.id = 42;`}

Document — embedded structure

{`// One document in the "posts" collection
{
  "_id": "post_42",
  "title": "Intro to indexing",
  "body": "...",
  "author": {
    "id": "user_7",
    "name": "Alice"
  },
  "comments": [
    { "text": "Great article!", "by": "Bob"   },
    { "text": "Very helpful.",  "by": "Carol" }
  ]
}

// Single read — no JOIN needed
db.posts.findOne({ _id: "post_42" })`}

How engineers talk about relational vs document databases

These phrases come up in architecture reviews, stand-ups, and technical interviews. Understanding them will help you follow — and contribute to — database discussions in English.

Relational / SQL phrases

"We need to run a migration to add the new column — ALTER TABLE blocks writes briefly."
"This query is doing a full table scan — we should add an index on created_at."
"We normalise the schema to avoid data anomalies when we update a record."
"We wrap both inserts in a transaction so they succeed or fail atomically."
"The foreign key constraint prevents orphaned orders if a customer is deleted."

Document / NoSQL phrases

"We embed the address in the user document — it's always fetched together anyway."
"The collection has no fixed schema, so adding new fields doesn't require a migration."
"We're hitting eventual consistency — reads from the secondary can be slightly stale."
"The document is denormalised — we duplicate the product name for faster reads."
"We use a $lookup aggregation stage — it's basically a JOIN, but it's slower."

"The query flexibility of SQL is hard to beat when requirements change — you can JOIN any two tables without schema changes."
"Our access pattern is always 'fetch the whole order with its items' — embedding makes more sense than normalising here."
"We need horizontal scaling — Mongo shards across nodes more easily than Postgres out of the box."

Decision guide

Data has clear, stable relationships (orders, invoices, inventory) → Relational
Complex ad-hoc reporting with many JOINs → Relational
Financial, medical, or legal data requiring strict ACID guarantees → Relational
Schema changes frequently or varies per record (product catalogue, CMS) → Document
Data naturally lives together and is always read as a unit (user profile + preferences) → Document
Write-heavy workload that must scale horizontally across many nodes → Document
Rapid prototyping where the schema is still evolving → Document
Team already knows SQL and the dataset fits comfortably on one server → Relational
Mixed requirements → Both — many production systems use PostgreSQL for transactional data and MongoDB (or Firestore) for flexible, denormalised content

Frequently asked questions

What is the fundamental difference between a relational and a document database?

A relational database stores data in tables with fixed columns and enforces relationships through foreign keys and JOIN operations. A document database stores self-contained JSON (or BSON) documents inside collections, with no fixed schema required. The relational model prioritises integrity and consistency; the document model prioritises flexibility and developer speed.

What does ACID mean, and which databases guarantee it?

ACID stands for Atomicity (a transaction is all-or-nothing), Consistency (the database moves from one valid state to another), Isolation (concurrent transactions do not interfere), and Durability (committed data survives a crash). Traditional relational databases such as PostgreSQL and MySQL provide full ACID guarantees. Most document databases — MongoDB included — offer ACID on single documents by default and multi-document ACID transactions as an opt-in feature added in later versions.

When should I choose a relational database over a document database?

Choose a relational database when your data has clear, stable relationships (orders, order items, products, customers), when you need complex multi-table queries with JOINs, when data integrity and referential constraints are non-negotiable (finance, healthcare, e-commerce inventory), or when your team is comfortable writing SQL.

Show more questions (4)