Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The innovation is that everything is just one standardized structure now (transformer models) and you make it bigger if you feel like you need that.

There's still some room for experimenting if you care about memory/power efficiency, like MoE models, but they're not as well understood yet.



There are too many papers throwing transformers on everything without thinking. Transformers are amazing for language but kinda mid on everything else. CS researchers tend to jump on trends really hard, so it will probably go back to normal again soon.


I don't know what you mean by amazing for language. Almost everything is built on transformers nowadays. Image segmentation uses transformers. Text to speech uses transformers. Voice recognition uses transformers. There are robotics transformers that take image inputs and output motion sequences. Transformers are inherently multi-modal. They handle whatever you throw at them, it's just that language tends to be a very common input or output.


That is not true. Transformers are being applied all over because they work better than what was used before in so many cases.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: