Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have the opposite: my most hated illustration.

It's the standard diagram of how Transformer language model works (https://www.researchgate.net/figure/Transformer-Language-Mod...). When I tried to figure out transformers, I saw it in every single paper, and it didn't help almost at all. I think I finally got a good understanding only when I looked at a few implementations.



Also it is wrong. The paper has add and norm, but the official implementation and all other good implementation has pre norm architecture: https://twitter.com/francoisfleuret/status/14671353665032192...




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: