Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> You can probe what was learned if you have access to the model; it'll tell you, especially if you do it before applying the safety features.

Does that involve actually parsing the data itself, or effectively asking the model questions to see what was learned?

If the data model itself can be parsed and analyzed directly by humans that is better than I realized. If its abstracted through an interpreter (I'm sure my terminology is off here) similar to the final GPT product then we still can't really see what was learned.



Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: