yes, you are missing that the tokens aren't words, they are 2-3 letter groups, o...

smokel · on Sept 19, 2024

Nope, I'm not missing that particular fact. I'm aware that sentences (and words) are split into tokens, which are vectors.

I don't understand how most LLMs can spell out words though, nor do I understand what is causing the failure to count characters in words. I was not convinced by the comment I was responding to.