Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that while the mathematics of neural networks are clearly completely understood we do not really understand why neural networks behave the way that they do when combined with large amounts of real world data.

In particular the ability of auto regressive transformer based networks to produce sequences speech while being immutable still shocks me whenever I think about it. Of course, this says as much about what we think of ourselves and other humans as it does about the matrices. I also think that the weather forcasting networks are quite shocking, the compression that they have achieved in modeling the physical system that produces weather is frankly.... wrong... but it obviously does actually work.



You can represent many things with numbers and build an algorithm that does stuff. ML techniques are formulas where some specifics constants are not known yet, so you go through a training phase to find them.

While combinations of words are infinite, only some makes sense. So there’s a lot of reccurent patterns there. When you take a huge datasets like most of the internet and digital documents. I would be more surprised if the trained model where incapable of producing correct texts as both the it’s overfitted to the grammar and the lexicon. And I believe it’s overfitted to general conversation patterns.


There is a lot of retrieval in the behaviours of LLM's, but I find it hard to characterize it as overfitted. For example, ask ChatGPT to respond to your questions with grammatically incorrect answers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: