Sampling Parameters Vocabulary

Temperature, top-p nucleus sampling, top-k, repetition penalty, frequency penalty, presence penalty, max tokens, and stop sequences.

Key vocabulary

Temperature — controls randomness; low values (e.g., 0.1) make output more deterministic, high values (e.g., 1.2) make it more creative and varied.
Top-p (nucleus sampling) — restricts sampling to the smallest set of tokens whose cumulative probability exceeds p; balances diversity and coherence.
Top-k sampling — restricts sampling to the k most probable next tokens at each step.
Repetition penalty — reduces the probability of tokens that have already appeared in the output, discouraging repetitive text.
Stop sequences — strings that cause the model to halt generation when encountered (e.g., "\n\n", "END").

0 / 5 completed

1 / 5

A colleague sets temperature = 0.1 for a code generation task. What effect does this have?

2 / 5

What does top-p = 0.9 (nucleus sampling) mean in practice?

3 / 5

A developer sets a high repetition penalty. What problem are they solving?

4 / 5

What are stop sequences used for in LLM API calls?

5 / 5

How does top-k sampling differ from top-p (nucleus) sampling?