Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's the same thing. Predict the next pixel, or the next token (same way you handle regular images), or infill missing tokens (MAE is particularly cool lately). Those induce the abstractions and understanding which get tapped into.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: