Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The data is easily generated by compiling the code.


But compiled code loses a lot of the "extra" data. Also these are "language" models so I would be surprised if training on binaries was much more efficient versus writing in some kind of language.

Besides, how do you even check the result now without running untrusted code? Every run of the model you need to reverse-engineer the binary?




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: