AdvancedVocabulary#ai-llm#data-science-ml#developer-tools

Mixture-of-Experts (MoE) Vocabulary

Build fluency in the vocabulary of a model built from sparsely activated expert sub-networks.

0 / 5 completed

1 / 5

At standup, a dev mentions a model built from many separate 'expert' sub-networks, where only a small subset is activated for any given input rather than the whole model running every time. What is this architecture called?

2 / 5

During a design review, the team wants a small network to decide, per input token, which handful of experts should actually process it. Which capability supports this?

3 / 5

In a code review, a dev notices an auxiliary loss term is added during training specifically to keep token traffic spread evenly across experts, rather than letting the router collapse onto a favorite few. What does this represent?

4 / 5

An incident report shows a mixture-of-experts model's quality was noticeably uneven across topics because the router had collapsed onto a handful of experts during training, leaving most of the pool essentially untrained. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team adopts a mixture-of-experts architecture instead of just building one larger dense model with the same total parameter count. What is the reasoning?