Practise the standard verbs for chunking documents for retrieval-augmented generation.
0 / 5 completed
1 / 5
Fill in: 'We ___ source documents into overlapping passages before indexing them, so a fact split across a chunk boundary is still retrievable from at least one chunk.'
We 'chunk documents' — the standard, established RAG collocation for splitting text into retrievable passages. The other options aren't the recognised term here.
2 / 5
Fill in: 'Chunking documents with zero overlap between adjacent passages can ___ a sentence's crucial context stranded in the chunk right before or after the one actually retrieved.'
We say no overlap will 'leave' context stranded across a boundary — the standard, natural collocation for the resulting gap. The other options aren't idiomatic here.
3 / 5
Fill in: 'We ___ chunk size against the embedding model's context limit and typical query length, rather than picking a number that just feels reasonable.'
We 'size a chunk' — the standard, simple collocation for calculating an appropriate passage length ahead of indexing. The other options are less idiomatic here.
4 / 5
Fill in: 'We ___ retrieval quality with recall@k against a labelled set of question-answer pairs before trusting a new chunking strategy in production.'
We 'evaluate' quality — the standard, simple collocation for measuring retrieval performance against a benchmark. The other options are less idiomatic here.
5 / 5
Fill in: 'We ___ chunk boundaries at natural section breaks where possible, rather than cutting mid-sentence purely at a fixed character count.'
We 'align boundaries' — the standard, established collocation for matching chunk edges to a document's natural structure. The other options aren't the recognised term here.