Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The attention mechanism is capable of computing, in my thought experiment where you can magically pluck a weights-set from a trillion-dimensional space the tokens the machine will predict will only have a tiny subset dedicated to language. We have no capability of training such a system at this time, much like we have no way of training a non-differentiable architecture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: