Master online vs offline stores, point-in-time correctness, training-serving skew prevention, feature pipelines, and low-latency feature serving.
0 / 5 completed
1 / 5
What is the difference between an online store and an offline store in a feature store?
Online vs offline store: training requires joining features across time (offline store, e.g. BigQuery). Real-time serving requires looking up the current feature value for a given entity in milliseconds (online store, e.g. Redis). A feature store syncs values from offline to online via materialisation pipelines, keeping both in sync.
2 / 5
What is point-in-time correctness in feature engineering?
Point-in-time correctness: if a fraud model is trained on "account age at time of transaction," you must look up the account age as it was at that transaction's timestamp, not today's age. Feast and Tecton perform point-in-time joins automatically. Violating this causes training-serving skew — the model sees impossible future information during training.
3 / 5
What is training-serving skew and how does a feature store help prevent it?
Training-serving skew: the classic problem: training computes "customer purchase count in last 7 days" with Spark; serving re-implements it with Python and a bug. A feature store defines transformation logic once in Python/SQL and materialises it consistently to both the offline training export and the online serving store.
4 / 5
What is a feature pipeline in the context of a feature store?
Feature pipeline: batch pipelines (Spark, dbt) run on a schedule to compute aggregations like "weekly average spend." Streaming pipelines (Flink, Spark Streaming) compute low-latency features like "number of logins in the last 5 minutes." Both write to the feature store's offline and/or online store depending on freshness requirements.
5 / 5
What is feature serving and what are typical latency requirements?
Feature serving: at prediction time, the system looks up entity features (e.g. user_id → [age, tenure, last_login_days]) from the online store. Redis or DynamoDB serve these lookups in 1-5ms. Feature serving latency directly contributes to end-user response time, making the online store's performance critical for real-time ML products.