AdvancedVocabulary#data-science-ml#backend#developer-tools

Attention Mechanism Vocabulary

Learn the vocabulary of computing a relevance-weighted combination of inputs for every output position.

0 / 5 completed

1 / 5

At standup, a dev mentions computing, for every output position, a weighted combination of all input positions, where the weights reflect how relevant each input is to that particular output, instead of treating every input position equally. What is this technique called?

2 / 5

During a design review, the team adds an attention mechanism to a sequence-to-sequence model, specifically because computing relevance-weighted combinations of every input position avoids compressing an entire long input into a single fixed-size vector before decoding. Which capability does this provide?

3 / 5

In a code review, a dev notices a sequence-to-sequence feature over long inputs compresses the entire input into one fixed-size vector before decoding begins, instead of computing a relevance-weighted combination of input positions for each output position. What does this represent?

4 / 5

An incident report shows a translation feature's output quality degraded sharply on long input sentences, because it compressed the entire input into one fixed-size vector before decoding instead of computing a relevance-weighted combination of input positions per output. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team adds an attention mechanism instead of simply enlarging the fixed-size summary vector to hold more information. What is the reasoning?