Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's not how the method works... The full transformer is only needed once to extract the activation fields. That step can even be done offline. Then the teacher can be discarded entirely. The compression result refers to the size of the learned field representation and the small student head that operates directly on it. Simple. No fake claim there. Inference with the student does not involve the transformer at all.

If you look at the student-only scripts in the repo, those runs never load the teacher. That's the novel part.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: