Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I understand it differently,

LLMs predict distributions, not specific tokens. Then an algorithm, like beam search, is used to select the tokens.

So, the LLM predicts somethings like, 1. ["a", "an", ...] 2. ["astronomer", "cosmologist", ...],

where "an astronomer" is selected as the most likely result.



Just to be clear, the probability for "An" is high, just based on the prefix. You don't need to do beam search.


They almost certainly only do greedy sampling. Beam search would be a lot more expensive; also I'm personally skeptical about using a complicated search algorithm for inference when the model was trained for a simple one, but maybe it's fine?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: