I understand it differently, LLMs predict distributions, not specific tokens. Th...

colah3 · 2025-03-27T23:31:35 1743118295

Just to be clear, the probability for "An" is high, just based on the prefix. You don't need to do beam search.

astrange · 2025-03-28T21:15:30 1743196530

They almost certainly only do greedy sampling. Beam search would be a lot more expensive; also I'm personally skeptical about using a complicated search algorithm for inference when the model was trained for a simple one, but maybe it's fine?