AdvancedVocabulary#data-science-ml#backend#developer-tools

Transformer Architecture Vocabulary

Build fluency in the vocabulary of processing a sequence in parallel using self-attention across every token pair.

0 / 5 completed
1 / 5
At standup, a dev mentions a neural network architecture that processes an entire sequence at once using self-attention to weigh how much every token should attend to every other token, instead of processing tokens one at a time in order. What is this architecture called?