Learn the vocabulary of training a shared model by aggregating locally computed updates without centralizing raw user data.
0 / 5 completed
1 / 5
A teammate explains that a training approach keeps each user's raw data on their own device, sending only locally computed model updates to a central server, which aggregates those updates into a shared global model without ever collecting the raw data itself. What technique is being described?
Federated learning is exactly this: each user's raw data stays on their own device, and only locally computed model updates, not the raw data itself, are sent to a central server, which aggregates those updates into a shared global model. A hash collision is an unrelated hash-table concept about two keys sharing a bucket. This aggregate-updates-without-centralizing-raw-data approach is exactly why federated learning is used when raw user data is too sensitive or too large to centralize.
2 / 5
During a design review, the team trains a keyboard-prediction model using federated learning across millions of phones, specifically so each phone's private typing data never leaves the device while the global model still improves from aggregated updates. Which capability does this provide?
Federated learning here provides model improvement without centralizing sensitive raw user data, since only locally computed updates are aggregated into the global model while raw typing data stays on-device. Uploading every phone's raw typing data to a central server for training would expose deeply private content and create a large centralized store of sensitive data, which is exactly what federated learning is designed to avoid. This aggregate-only-the-updates behavior is exactly why federated learning is favored for training on sensitive, distributed user data.
3 / 5
In a code review, a dev notices a keyboard-prediction training pipeline uploads each phone's raw typing logs directly to a central server for batch training, instead of computing model updates locally on-device and aggregating only those updates. What does this represent?
This is a missed federated-learning opportunity, since computing updates locally and aggregating only those updates would avoid centralizing every phone's raw, sensitive typing logs. A cache eviction policy is an unrelated concept about discarded cache entries. This upload-raw-logs-directly pattern is exactly the kind of privacy exposure a reviewer flags once user data is this sensitive.
4 / 5
An incident report shows a central server holding millions of phones' raw typing logs was breached, exposing deeply private user content, because the training pipeline centralized raw data instead of aggregating locally computed updates. What practice would prevent this?
Switching the pipeline to federated learning lets each phone compute its own model update locally, so only those updates, never the raw typing logs, are sent to the server. Continuing to upload each phone's raw typing logs to the central server regardless of how sensitive that content is or how large the exposure from a breach would be is exactly what caused the exposure described in this incident. This aggregate-local-updates-only approach is the standard fix once centralizing raw sensitive data is confirmed to be a breach liability.
5 / 5
During a PR review, a teammate asks why the team reaches for federated learning instead of simply centralizing all user data and training a model the conventional way, given that centralized training pipelines are more mature and easier to debug. What is the reasoning?
Federated learning trades some added coordination complexity around on-device computation and update aggregation for never having to centralize sensitive raw user data, while conventional centralized training is easier to debug but requires collecting and storing that raw data in one place. This is exactly why federated learning is favored when raw user data is sensitive or regulated, while conventional centralized training remains simpler and acceptable when data is not privacy-sensitive.