IntermediateVocabulary#data-science-ml#backend#developer-tools

Tokenization Vocabulary

Learn the vocabulary of splitting raw text into subword units mapped to numeric identifiers a model can process.

0 / 5 completed
1 / 5
At standup, a dev mentions splitting raw input text into smaller units, such as subword pieces or whole words, each mapped to a numeric identifier a language model can actually process. What is this step called?