English for Feast Feature Store Developers

Learn the English vocabulary for Feast: feature views, online and offline stores, point-in-time joins, and feature serving for ML.

Feast conversations mix data-engineering and ML vocabulary in ways that don’t map neatly onto either discipline alone, so ML engineers and data engineers on the same team often need to align on the same precise terms.

Key Vocabulary

Feature view — a definition that groups related features together with their source data, entity keys, and a time-to-live, serving as the unit Feast uses to materialize and serve data. “Split the user features into their own feature view — bundling them with the transaction features means we’re refreshing slow-changing data far too often.”

Online store — a low-latency key-value store, like Redis or DynamoDB, that Feast uses to serve the freshest feature values at inference time. “Model inference is timing out because the online store hasn’t been materialized since yesterday — the features being served are stale.”

Offline store — a store, typically a data warehouse, used for generating historical training datasets by joining feature values as they existed at specific points in time. “For training data, pull from the offline store with a point-in-time join — we can’t just query the online store’s current values for historical labels.”

Point-in-time join — the process of joining feature values to labeled training examples using the exact timestamp of each label, preventing the model from training on data that wasn’t actually available at that moment. “Without a proper point-in-time join, this model would have leaked future information — the feature values need to reflect what was known at label time, not today.”

Materialization — the process of computing and loading feature values from the offline store into the online store so they’re available for low-latency serving. “Set up a materialization job to run every fifteen minutes — without it, the online store just keeps serving whatever was loaded during the last manual run.”

Common Phrases

  • “Is this feature view scoped to the right entity, or are we joining on the wrong key?”
  • “Are we serving from the online store, or accidentally hitting the offline store at inference time?”
  • “Did we use a point-in-time join for this training set, or is there a risk of label leakage?”
  • “Is materialization running on a schedule, or are we relying on someone to trigger it manually?”
  • “How stale can these features get before the model’s predictions degrade meaningfully?”

Example Sentences

Debugging a data leakage concern: “The model looked suspiciously good in offline evaluation — turned out the point-in-time join wasn’t applied, so it trained on feature values that didn’t exist yet at label time.”

Explaining an architecture choice: “We split the feature view by refresh cadence — the slow-changing demographic features are separate from the fast-changing behavioral ones, so materialization doesn’t waste cycles.”

Reviewing a pull request: “This feature view’s TTL is too long for the online store — by the time it serves, the feature’s already stale relative to how often the underlying data actually changes.”

Professional Tips

  • Distinguish online store from offline store precisely in any serving-versus-training discussion — conflating them is a common and costly source of training-serving skew.
  • Name point-in-time join explicitly when discussing training data correctness — it’s the specific mechanism that prevents label leakage, and naming it signals rigor.
  • Say feature view, not “table” or “dataset,” when describing Feast’s unit of feature definition — it carries entity and freshness semantics a plain table doesn’t.
  • Reference materialization as a concrete, schedulable job when debugging staleness — it moves the conversation from “the data seems old” to “when did materialization last run.”

Practice Exercise

  1. Explain the difference between an online store and an offline store in Feast.
  2. Describe what a point-in-time join prevents and why it matters for training data.
  3. Write a sentence explaining what materialization does and why it needs to run on a schedule.