Build fluency in the vocabulary of storing and retrieving embeddings by semantic similarity.
0 / 5 completed
1 / 5
At standup, a dev mentions storing document embeddings so that a query can retrieve the most semantically similar entries rather than ones matching exact keywords. What kind of database supports this?
A vector database stores high-dimensional embeddings and retrieves entries by semantic similarity rather than exact keyword matching, letting a query surface conceptually related results even when the wording differs. This underlies most modern semantic search and retrieval-augmented generation systems. A traditional relational database indexed on exact keys isn't built for this kind of similarity comparison.
2 / 5
During a design review, the team wants to reduce search latency by grouping nearby vectors so a query only compares against a relevant subset instead of every stored vector. Which capability supports this?
Approximate nearest neighbor indexing organizes vectors into a structure, like a graph or set of clusters, so a query only needs to compare against a relevant subset rather than every vector in the store. This trades a small amount of accuracy for a large gain in search speed at scale. A full linear scan stays exact but becomes impractically slow as the number of stored vectors grows.
3 / 5
In a code review, a dev notices a search query is configured to also filter results by a metadata field, like a document's category, in addition to vector similarity. What does this represent?
Metadata filtering combined with vector search narrows results to only those matching a specific attribute, like category or date, while still ranking by semantic similarity within that filtered subset. This lets a query be both semantically relevant and constrained to a meaningful scope. Relying on vector similarity alone can surface results that are conceptually close but practically irrelevant, like the wrong category or an outdated document.
4 / 5
An incident report shows a vector database's index went stale after the underlying documents were updated, so queries kept returning outdated embeddings. What practice would prevent this?
Re-embedding and re-indexing an updated document as part of the same update pipeline keeps the vector store's search results consistent with the document's current content. Assuming the index automatically stays current skips a real step, since the embedding was computed once and won't update itself. This synchronization step matters most for a system where source documents change frequently.
5 / 5
During a PR review, a teammate asks why the team uses a vector database instead of relying on traditional keyword search for this retrieval feature. What is the reasoning?
Keyword search matches literal terms, so it misses a relevant result phrased differently even if the underlying meaning is the same. A vector database compares semantic meaning through embeddings, surfacing conceptually related results regardless of exact wording. The tradeoff is the added complexity of generating, storing, and keeping embeddings up to date compared to a simpler keyword index.