Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

yes, you are missing that the tokens aren't words, they are 2-3 letter groups, or any number of arbitrary sizes depending on the model


Nope, I'm not missing that particular fact. I'm aware that sentences (and words) are split into tokens, which are vectors.

I don't understand how most LLMs can spell out words though, nor do I understand what is causing the failure to count characters in words. I was not convinced by the comment I was responding to.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: