Practice the vocabulary of keeping stored vectors comparable across an embedding model upgrade.
0 / 5 completed
1 / 5
At standup, a dev mentions that vectors produced by an older embedding model version are no longer meaningfully comparable to vectors produced by a newer version, since the newer model encodes semantic meaning differently. What is this phenomenon called?
Embedding drift describes how vectors produced by an older embedding model version are no longer meaningfully comparable to vectors from a newer version, since the underlying model encodes semantic meaning differently after a retrain or architecture change. Assuming vectors stay identical across every version ignores that even a same-sized embedding space can encode meaning very differently after a model update. This drift is what makes silently mixing vectors from two different model versions dangerous for similarity search.
2 / 5
During a design review, the team wants every stored vector regenerated with the new embedding model whenever that model's version changes, rather than leaving old vectors untouched alongside new ones. Which capability supports this?
A re-embedding backfill regenerates every stored vector with the new embedding model whenever its version changes, rather than leaving an old, incompatible vector sitting alongside newly generated ones. Leaving old vectors untouched risks a similarity search comparing two vectors that were never meant to be compared directly. This backfill is what keeps a vector store internally consistent across an embedding model upgrade.
3 / 5
In a code review, a dev notices each stored vector carries a tag recording exactly which embedding model version produced it, letting the system detect a mismatch before comparing two vectors. What does this represent?
Embedding version tagging records exactly which model version produced a stored vector, letting the system detect a mismatch before comparing two vectors that were generated by different, incompatible versions. Storing vectors with no version tag makes such a mismatch invisible until a search result quietly turns out to be nonsensical. This tagging is a lightweight but essential safeguard for any system that might upgrade its embedding model over time.
4 / 5
An incident report shows search relevance degraded sharply right after a new embedding model version was rolled out, because old vectors from the previous version were being compared directly against new query embeddings with no version check catching the mismatch. What practice would prevent this?
Tagging every stored vector with its embedding model version, and re-embedding an older vector before comparing it against a newer version, prevents exactly the kind of relevance collapse this incident describes. Comparing untagged, mismatched vectors directly produces a similarity score that's essentially meaningless. This version-aware handling is what keeps a search system reliable across an embedding model upgrade.
5 / 5
During a PR review, a teammate asks why the team tags and re-embeds stored vectors after an embedding model upgrade instead of just leaving the old vectors in place alongside newly generated ones. What is the reasoning?
A vector from an older embedding model version encodes meaning differently than one from a newer version, so comparing the two directly produces a similarity score that doesn't reflect genuine semantic closeness. Tagging and re-embedding keeps every comparison meaningful. The tradeoff is the added cost and operational effort of regenerating a large existing vector store whenever the embedding model is upgraded.