The data is easily generated by compiling the code. | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ninetyninenine 9 months ago \| parent \| context \| favorite \| on: AI-assisted coding will change software engineerin... The data is easily generated by compiling the code.

0xCMP 9 months ago [–]

But compiled code loses a lot of the "extra" data. Also these are "language" models so I would be surprised if training on binaries was much more efficient versus writing in some kind of language.

Besides, how do you even check the result now without running untrusted code? Every run of the model you need to reverse-engineer the binary?

Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact