Was starting to mess around with the latest LLM models and found that they're not great at counting lines in files.
I gave Gemini 2.5 flash a python script and asked it to tell me what was at line 27 and it consistently got it wrong. I tried repeatedly to prompt it the right way, but had no luck.
https://g.co/gemini/share/0276a6c7ef20
Is this something that LLM bots are still not good at? I thought they had gotten past the "strawberry" counting problems.
Here's the raw file: https://pastebin.com/FBxhZi6G
Imagine you spoke perfect English, but you learned how to write English using Mandarin characters, basically using the closest sounding Mandarin characters to write in English. Then someone asks you how many letter o's are in the sentence "Hello how are you?". Well you don't read using English charaters, you read using Mandarin characters so you read it as "哈咯,好阿优?" because using Mandarin letters that's the closest sounding way to spell "Hello how are you?"
So now if someone asks you how many letter o's are in "哈咯,好阿优?", you don't really know... you are familiar conceptually that the letter o exists, you know that if you spelled the sentence in English it would contain the letter o, and you can maybe make an educated guess about how many letter o's there are, but you can't actually count out how many letter o's there are because you've never seen actual English letters before.
The same thing goes for an LLM, they don't see characters, they only see tokens. They are aware that characters do exist, and they can reason about their existence, but they can't see them so they can't really count them out either.