English for MongoDB
Learn the English vocabulary for MongoDB: documents, indexes, and sharding, explained for discussing NoSQL database operations clearly.
A slow MongoDB query, a bad shard key, and a schema that grew unmanageable each produce very different symptoms, and naming which one you’re dealing with — instead of a generic “the database is slow” — is what actually moves a diagnosis forward.
Key Vocabulary
Document — a single record in MongoDB, stored as a BSON (binary JSON) object, roughly the equivalent of a row in a relational database but with a flexible, nested structure. “Each user document embeds their own address directly instead of referencing a separate table — that’s normal in MongoDB’s document model, even though it’d be a foreign key in a relational schema.”
Collection — a group of documents in MongoDB, analogous to a table but without a rigid, enforced schema — documents in the same collection can have different fields.
“We added a new field to some documents in the users collection without a migration — that’s fine in MongoDB, but it means older documents just won’t have that field until they’re updated.”
Index — a data structure that speeds up queries on specific fields, at the cost of extra write overhead and storage; without one, MongoDB scans every document in a collection to satisfy a query.
“That query’s taking two seconds because there’s no index on the email field — it’s doing a full collection scan on every login attempt.”
Sharding — horizontally partitioning a collection’s data across multiple servers based on a shard key, used to scale beyond what a single server can handle in storage or throughput.
“We sharded the events collection by userId once it passed a billion documents on one node — a single server just couldn’t hold the data or the write throughput anymore.”
Aggregation pipeline — a sequence of stages (filtering, grouping, transforming) applied to documents to compute derived results, MongoDB’s primary mechanism for the kind of analytics a GROUP BY would handle in SQL.
“We’re computing monthly revenue with an aggregation pipeline instead of pulling every document into the application and summing it there — it’s dramatically faster since it runs inside the database.”
Common Phrases
- “Is this query slow because there’s no index, or is something else going on?”
- “What’s the shard key for this collection?”
- “Is this an aggregation pipeline problem, or is the underlying query itself slow?”
- “Do all the documents in this collection actually share the same shape?”
- “Is this a full collection scan?”
Example Sentences
Diagnosing a slow query: “This query’s doing a full collection scan because there’s no index on the field it’s filtering by — once we add one, this should go from two seconds to under ten milliseconds.”
Explaining a scaling decision:
“We sharded by tenantId instead of _id specifically because our access pattern is almost always scoped to one tenant — a random shard key would’ve spread each tenant’s data across every shard and made every query hit all of them.”
Describing a schema evolution:
“Since MongoDB doesn’t enforce a fixed schema, older documents in this collection are missing the status field we added last quarter — the application code has to handle that with a default rather than assuming it’s always present.”
Professional Tips
- Say document, not “row,” when describing MongoDB records — it signals you understand the nested, schemaless structure instead of thinking of it as a relational table.
- Always check for a missing index before assuming a slow query is a hardware or scaling problem — an unindexed field causing a full collection scan is one of the most common and cheapest-to-fix causes of slowness.
- Justify the shard key choice explicitly in any scaling discussion — a poorly chosen shard key can make sharding actively worse by concentrating load instead of distributing it.
- Use aggregation pipeline, not “some Mongo query,” when describing computed analytics — it tells a teammate exactly what kind of operation to look at and optimize.
Practice Exercise
- Write a sentence explaining why a missing index can cause a slow query.
- Explain what a shard key does and why choosing a bad one is a problem.
- Describe when you’d use an aggregation pipeline instead of pulling documents into application code.