ChromaDB is one of the most popular open-source vector databases for AI applications. Understanding collections, embedding queries, metadata filtering syntax, and the difference between client types is essential for building efficient similarity search into your applications.
0 / 5 completed
1 / 5
In ChromaDB, what is a Collection?
A Collection is ChromaDB's primary data container. It holds a set of embeddings alongside their associated documents and metadata. You create, get, or delete collections by name and perform all CRUD and query operations against them.
2 / 5
When you call collection.query(query_texts=['...'], n_results=5), what does n_results control?
n_results specifies how many of the closest embedding matches to return. ChromaDB ranks candidate embeddings by distance (e.g., cosine, L2) and returns the top n_results items with their documents, metadata, and distances.
3 / 5
ChromaDB supports metadata filtering with where clauses. What is the correct operator to match documents where year is greater than 2022?
ChromaDB uses a MongoDB-style filter syntax. $gt is the greater-than operator, so {'year': {'$gt': 2022}} matches documents where the year metadata field exceeds 2022. Other supported operators include $gte, $lt, $lte, $eq, and $ne.
4 / 5
What does using a PersistentClient instead of an EphemeralClient in ChromaDB mean for your data?
PersistentClient stores collections and embeddings in a local directory on disk (using SQLite + parquet files), so data is durable across restarts. EphemeralClient (previously Client()) keeps everything in RAM and loses all data when the process exits.
5 / 5
ChromaDB defaults to cosine distance for similarity search. When would you choose L2 (Euclidean) distance instead?
Cosine distance measures angular similarity, ignoring vector magnitude — useful when only the direction (relative proportions) matters. L2 distance accounts for both direction and magnitude, making it appropriate when the absolute scale of embedding dimensions is semantically significant.