LLMs predict distributions, not specific tokens. Then an algorithm, like beam search, is used to select the tokens.
So, the LLM predicts somethings like, 1. ["a", "an", ...] 2. ["astronomer", "cosmologist", ...],
where "an astronomer" is selected as the most likely result.
LLMs predict distributions, not specific tokens. Then an algorithm, like beam search, is used to select the tokens.
So, the LLM predicts somethings like, 1. ["a", "an", ...] 2. ["astronomer", "cosmologist", ...],
where "an astronomer" is selected as the most likely result.