AdvancedVocabulary#data-science-ml#backend#developer-tools

Positional Encoding Vocabulary

Build fluency in the vocabulary of adding an order signal to each token's embedding before self-attention.

0 / 5 completed

1 / 5

At standup, a dev mentions adding a signal to each token's embedding that encodes its position in the sequence, specifically because a transformer's self-attention alone treats every token position as interchangeable and has no built-in sense of order. What is this technique called?

2 / 5

During a design review, the team adds positional encoding to a transformer's input embeddings, specifically because self-attention alone would treat 'the dog bit the man' and 'the man bit the dog' as the same set of tokens with no notion of order. Which capability does this provide?

3 / 5

In a code review, a dev notices a transformer-based model's input pipeline feeds raw token embeddings directly into self-attention with no position signal added at all, meaning shuffling the input tokens would produce an identical set of attention computations. What does this represent?

4 / 5

An incident report shows a transformer-based model consistently failed to distinguish sentences that differed only in word order, such as subject and object being swapped, because its input pipeline fed raw token embeddings into self-attention with no position signal added at all. What practice would prevent this?

5 / 5

During a PR review, a teammate asks why the team adds positional encoding to a transformer instead of switching to a recurrent architecture, which naturally processes tokens in order without needing an explicit position signal. What is the reasoning?