What English level do I need to read "MongoDB & Document Database Vocabulary: 25 Terms Explained"?

This article is tagged Intermediate. If you find the vocabulary difficult, start with a related Vocabulary vocabulary exercise first, then come back — technical reading gets much easier once the core terms feel familiar.

Is this article free to read?

Yes. Every article on CoderSlingo, including this one, is free to read with no account, sign-up, or paywall.

How is reading this article different from doing an exercise?

Articles like this one explain concepts and vocabulary in context through prose, while exercises are interactive drills — fill-in-the-blank, matching, and multiple-choice — that test and reinforce specific terms. Reading builds understanding; exercises build recall.

Can I practice the vocabulary used in this article?

Yes — this article's topic lines up with our #vocabulary exercises. Use the "Practice this vocabulary" link below to jump straight into a matching drill.

How long does this article take to read?

About 9 min. Most CoderSlingo articles are written to be read in one sitting, without needing a dictionary open in another tab.

Do I need to create an account to read or save this article?

No account is required to read any article. If you complete exercises elsewhere on the site, your progress is saved locally in your browser — no login needed.

What if I don't understand a technical term used in the article?

Check the site Glossary for plain-English definitions of common IT terms — HTTP status codes, Git commands, design patterns, and more — or look up the related vocabulary module for this topic.

Can I share or link to this article?

Yes — use the Twitter/X or LinkedIn share buttons at the end of the article, or copy the page URL directly. Attribution back to CoderSlingo is appreciated but the content is free to reference.

How often is new content like this published?

New articles are added regularly across all categories, alongside new vocabulary sets and exercises. Tag pages (like this article's tags) are a good way to find related content as it's published.

Where can I find more articles like this one?

See the "Related Articles" section below for hand-picked follow-ups, or browse all Vocabulary articles from the main Blog index.

MongoDB & Document Database Vocabulary: 25 Terms Explained

If you work on a backend team that uses MongoDB, you have probably heard phrases like “shard the collection by user ID” or “run it through the aggregation pipeline” and nodded along while quietly wondering what was actually being said. MongoDB has its own vocabulary — borrowed partly from relational databases, partly from distributed systems, and partly from its own design decisions. This post breaks down 25 essential terms so you can follow technical discussions, read pull request comments, and write accurate documentation without second-guessing yourself.

Core Terms

Document — the fundamental unit of data in MongoDB. A document is a set of key-value pairs stored in a flexible, JSON-like structure. Unlike a row in a relational table, a document can contain nested objects and arrays, and no two documents in the same collection need to share an identical shape.

“This endpoint returns a single user document, so just call findOne and pass it straight to the serialiser.”

“The document schema changed last sprint — there are now two possible shapes in production, so the parser needs to handle both.”

Collection — a grouping of documents, roughly analogous to a table in SQL. A collection does not enforce a fixed schema by default, though you can add validation rules.

“We moved the audit logs into a separate collection to keep the main orders collection lean.”

BSON — Binary JSON. MongoDB stores documents on disk and transmits them over the wire in BSON format rather than plain text JSON. BSON supports additional data types (such as dates and binary data) that standard JSON does not. As a developer you rarely interact with BSON directly, but it is worth knowing the term when reading driver documentation.

“The driver serialises your Python dict to BSON before writing — that is why the date field arrives as an ISODate rather than a plain string.”

ObjectId — the default type MongoDB uses for the _id field. An ObjectId is a 12-byte value that encodes a timestamp, a machine identifier, and a random component, making collisions extremely unlikely even across distributed nodes.

“If you are generating IDs on the client side, make sure you are using a proper ObjectId rather than a random string, otherwise the index won’t sort chronologically.”

Embedded document vs reference — two strategies for modelling relationships. An embedded document nests related data directly inside a parent document (good for data you always read together). A reference stores only the _id of a related document in another collection and requires a separate query or a $lookup to resolve it (good for data that is large, frequently updated independently, or shared across many documents).

“We debated embedding the address in the user document, but since multiple orders reference the same address we went with a reference instead.”

“For the product images we are embedding the URLs — they are small and we always need them when we fetch the product.”

Query & Aggregation

Aggregation pipeline — a sequence of processing stages that transform a set of documents into a result. Each stage receives the output of the previous stage, allowing you to filter, reshape, group, and join data in a single operation.

“The reporting endpoint is too slow because it is doing all the grouping in application code. Let’s move that logic into an aggregation pipeline.”

$match — a pipeline stage that filters documents using query conditions, similar to a SQL WHERE clause. Placing $match early in a pipeline reduces the number of documents passed to later stages.

“Add a $match at the top of the pipeline to filter by status: 'active' — right now we are pulling the entire collection into memory.”

$group — a pipeline stage that groups documents by a specified expression and can compute aggregate values such as sums, averages, and counts.

“The $group stage calculates total revenue per region, then $sort orders the results descending.”

$project — a pipeline stage that reshapes each document, including or excluding fields, renaming them, or computing new values. It is equivalent to a SQL SELECT.

“Use $project to strip out the internal audit fields before the response leaves the API layer.”

$lookup — a pipeline stage that performs a left outer join between the current collection and another collection in the same database, adding matching documents as an array field.

“We replaced the two-query pattern with a $lookup so the whole thing runs server-side in one round trip.”

Covered query — a query where all the fields requested in the filter and the projection are present in an index. MongoDB can satisfy a covered query entirely from the index without reading the underlying documents, which is significantly faster.

“I checked the explain plan and it is a covered query — no document fetches at all, just an index scan.”

Explain plan — the output of cursor.explain() or db.collection.explain(), which describes how MongoDB executed (or plans to execute) a query. The explain plan shows whether an index was used, how many documents were examined, and where time was spent.

“Before deploying that query to production, run an explain plan in staging and make sure it is not doing a COLLSCAN.”

Indexes

Index — a data structure that MongoDB maintains alongside a collection to speed up queries. Without an index, MongoDB must scan every document in a collection to find matches (a collection scan). Indexes come in several types.

Single-field index — an index on one field. The most common type, and a good starting point when you consistently query or sort by that field.

“There is no index on createdAt — add a single-field index and the date-range queries should drop from seconds to milliseconds.”

Compound index — an index on two or more fields. Field order matters: MongoDB can use a compound index to satisfy queries on a prefix of the indexed fields but not on a suffix alone.

“We have a compound index on userId and createdAt, so sorting by date per user is fast, but querying by createdAt alone won’t use it.”

Multikey index — an index on a field whose value is an array. MongoDB creates a separate index entry for each element of the array, enabling efficient queries on array contents.

“The tags field is an array, so MongoDB automatically creates a multikey index — you can query for any tag without scanning documents.”

Text index — a specialised index that tokenises string fields to support full-text search with the $text operator, including stemming and stop-word filtering.

“We added a text index on the description field so users can search by keyword without us spinning up Elasticsearch.”

Geospatial index — an index optimised for location data. The 2dsphere index supports queries on GeoJSON objects (points, lines, polygons) for operations like “find all venues within 5 km.”

“The store locator uses a 2dsphere index on the location field — the $near query returns results sorted by distance automatically.”

Reliability & Scale

Write concern — a setting that controls how many members of a replica set must acknowledge a write operation before MongoDB reports it as successful. A higher write concern increases durability at the cost of latency.

“We set write concern to majority on the payments collection — we would rather take the extra milliseconds than risk losing a transaction acknowledgement.”

Read preference — a setting that controls which member of a replica set your application reads from. primary (default) reads from the primary only; secondary allows reads from replica members, which can reduce load on the primary but may return slightly stale data.

“The analytics queries are running on secondaryPreferred to keep load off the primary — a few seconds of staleness is fine for dashboards.”

Replica set — a group of MongoDB instances (typically three or more) that maintain the same dataset. One member is the primary (accepts writes); the others are secondaries (replicate from the primary). If the primary becomes unavailable, the secondaries elect a new primary automatically.

“We had a brief outage last night when the primary went down, but the replica set elected a new primary in about ten seconds and everything recovered.”

Sharding — a method of distributing data across multiple servers (shards) so that no single machine holds the entire dataset. Sharding is MongoDB’s horizontal scaling strategy.

“Once the collection grows past a few hundred gigabytes we will need to think about sharding — the current single-shard setup won’t hold.”

Shard key — the field (or compound of fields) used to determine which shard a document belongs to. Choosing the right shard key is critical: a poor choice leads to uneven data distribution (hotspots).

“The team spent a week debating the shard key — we settled on a compound of tenantId and createdAt to get even distribution and locality.”

Chunk — in a sharded cluster, MongoDB divides the key space into contiguous ranges called chunks and assigns each chunk to a shard. The balancer moves chunks between shards to maintain even distribution.

“The balancer was running at peak traffic because too many chunks had migrated to one shard — we adjusted the chunk size to reduce the churn.”

Change stream — a feature that allows applications to subscribe to a real-time stream of data changes (inserts, updates, deletes) on a collection, database, or entire cluster. Change streams are built on MongoDB’s oplog and support resumable consumption.

“We replaced the polling loop with a change stream — now the notification service reacts to new orders in under a second instead of on a 30-second interval.”

How to Use These in Conversation

Discussing a schema design decision:

“I am not sure whether to embed the line items or reference a separate orderItems collection. Embedding keeps it to one document fetch, but a large order could push the document past the 16 MB BSON limit. What do you think?”

Reviewing a slow query:

“I ran the explain plan on the leaderboard query and it is doing a COLLSCAN because there is no compound index on gameId and score. If we add that index, it should become a covered query and response time should drop dramatically.”

Planning for scale:

“Once we hit around 500 GB of user data we should start thinking about sharding. The tricky part is choosing a shard key that gives us even distribution — if we pick userId alone we might get hotspots for power users.”

Proposing a real-time feature:

“Instead of polling the database every minute for new job postings, we could open a change stream on the jobs collection. The UI would update in near real-time and we would eliminate a lot of unnecessary read load.”

Quick Reference

Term	One-line definition
Document	A single BSON record — the basic unit of storage
Collection	A group of documents (like a table, but schema-flexible)
ObjectId	Default auto-generated `_id`; encodes timestamp + machine ID
Aggregation pipeline	A chain of stages that filter, transform, and join documents
Covered query	A query resolved entirely from an index — no document reads
Explain plan	Query execution report; shows index usage and document scan counts
Write concern	How many replica members must confirm a write before it is “done”
Read preference	Which replica member to read from (primary vs secondary)
Replica set	A cluster of MongoDB nodes sharing the same data for high availability
Change stream	Real-time subscription to insert/update/delete events on a collection

MongoDB & Document Database Vocabulary: 25 Terms Explained

Core Terms

Query & Aggregation

Indexes

Reliability & Scale

How to Use These in Conversation

Quick Reference

What to Read Next

Practice This Vocabulary

IT Collocations Drills

Interview Preparation

IT Vocabulary Modules

Frequently Asked Questions