A document is MongoDB's basic unit of data, a record stored in BSON (a binary form of JSON) consisting of field-value pairs. Documents can nest sub-documents and arrays, allowing rich, hierarchical structures that map naturally to objects in code. Unlike rigid relational rows, documents in a collection need not share an identical schema, giving flexibility. Each document has a unique _id field. This document model is what makes MongoDB a document database rather than a table-based one.
2 / 5
What is the aggregation pipeline?
The aggregation pipeline processes documents through a series of stages, each transforming the stream before passing it on. Stages like $match filter, $group aggregates, $project reshapes, $sort orders, and $lookup joins collections. Documents flow through the pipeline much like a Unix command chain, enabling powerful analytics and transformations on the server side. The pipeline is MongoDB's primary tool for computing summaries, joins, and reshaped results that a simple find query cannot express.
3 / 5
What is an index in MongoDB?
An index is a special data structure, typically a B-tree, that stores a small portion of a collection's data in an easily traversable form to speed up queries. Without a matching index, MongoDB performs a collection scan, examining every document, which is slow at scale. Indexes can cover single fields, multiple fields (compound), arrays (multikey), text, or geospatial data. They speed reads but add overhead to writes and consume storage, so you index the fields your queries actually filter and sort on.
4 / 5
What is sharding in MongoDB?
Sharding horizontally partitions a collection across multiple servers, called shards, so the dataset and its load exceed what a single machine could handle. A shard key determines how documents are distributed, and a router (mongos) directs queries to the relevant shards. Good shard key selection is crucial to avoid hotspots and uneven distribution. Sharding enables scaling beyond one server's capacity for both storage and throughput, at the cost of added operational complexity in the cluster.
5 / 5
What is a replica set in MongoDB?
A replica set is a group of mongod processes that maintain identical copies of the data to provide redundancy and high availability. One member is the primary, accepting writes, while secondaries replicate its operations. If the primary fails, an automatic election promotes a secondary, minimizing downtime. Secondaries can also serve reads to spread load. Replica sets protect against hardware failure and are the recommended baseline for any production MongoDB deployment, complementing sharding for scale.