English for Vespa Search Engine
Learn the English vocabulary for Vespa: document schemas, ranking profiles, tensor fields, and content clusters.
Vespa conversations combine search-engine vocabulary shared with tools like Elasticsearch (schema, index) with ranking-specific terms that are more unusual — tensor field, first-phase ranking — and blurring these two categories makes it harder to explain whether a relevance issue is a data problem or a scoring problem.
Key Vocabulary
Document schema — the definition of a document type’s fields, their types, and how each field is indexed, matched, and made available for ranking, declared in Vespa’s schema language.
“That field isn’t showing up in search results because it’s defined as attribute only in the schema, not index — it’s not being tokenized for text matching.”
Ranking profile — a named configuration of how documents are scored for a given query, combining features like text match, freshness, and popularity into a single relevance score. “We added a new ranking profile for the mobile app that weights recency more heavily than the default web ranking profile does.”
Tensor field — a multi-dimensional numeric field type used to store embeddings or other structured numeric data, enabling operations like dot-product similarity directly inside ranking expressions. “Store the embedding as a tensor field so we can compute cosine similarity against the query vector natively during ranking, instead of in a separate service.”
Content cluster — the group of nodes responsible for storing and serving a set of document types, configurable independently of the container cluster that handles query processing. “We’re scaling the content cluster separately from the container cluster since storage growth and query load aren’t growing at the same rate.”
First-phase vs. second-phase ranking — a two-stage scoring approach where a cheap first-phase function ranks all matching documents, and a more expensive second-phase function re-ranks only the top candidates. “Move that expensive feature computation into second-phase ranking — running it on every matching document in first-phase is what’s driving up query latency.”
Common Phrases
- “Is this field indexed for text search, or is it just an attribute we’re filtering on?”
- “Which ranking profile is this query actually using — the default, or a custom one?”
- “Should this embedding be a tensor field so we can rank on it directly?”
- “Is the content cluster or the container cluster the bottleneck under this load?”
- “Could we move this expensive feature to second-phase ranking instead of running it on every candidate?”
Example Sentences
Debugging a relevance complaint: “Results looked off because the query was hitting the default ranking profile instead of the custom one that weights our business-priority signal — we hadn’t wired the new profile into that endpoint yet.”
Explaining a latency fix: “We cut p99 latency by moving the tensor similarity computation from first-phase to second-phase ranking, since first-phase only needs a cheap approximate score to narrow the candidate set.”
Discussing a scaling decision: “We scaled the content cluster independently of the container cluster this quarter — storage was growing fast from new document types, but query volume was flat.”
Professional Tips
- Distinguish document schema field types (
indexvsattribute) explicitly when debugging a missing search result — it’s one of the most common sources of “why isn’t this showing up” confusion. - Name the specific ranking profile in use when discussing a relevance issue — “search results are bad” is far less actionable than “the
mobileranking profile is underweighting recency.” - Use tensor field precisely for embedding or structured numeric data, not as a synonym for any numeric field — it signals the field supports vector operations in ranking expressions.
- Reference first-phase and second-phase ranking by name when proposing a latency fix — it shows you understand Vespa’s specific two-stage scoring model, not just “make ranking faster.”
Practice Exercise
- Explain the difference between an
indexfield and anattributefield in one sentence. - Describe what a ranking profile controls and why a query might need more than one.
- Write a sentence explaining why first-phase and second-phase ranking exist as separate stages.