Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You are wrong, there can be a circuit to count letters because it can easily normalize them internally, as we know it can transform text to base64 just fine. So there is no reason there can't be a circuit to count letters.

The training just is too dumb to create such a circuit even with all that massive data input, but its super easy for a human to make such a neural net with those input tokens. Its just a kind of problem that transformers are exceedingly bad at solving, so they don't learn it very well even though its a very simple computation for them to do.



Transformers have a limited computation budget related to the size of the context, so it can get better at math the longer the conversation is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: