Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't think it's just the alignment work. I suspect OpenAI+Microsoft are over-doing the Reinforcement Learning from Human Feedback with LoRA. Most of people's prompts are stupid stuff. So it becomes stupider. LoRA is one of Microsoft's most dear discoveries in the field, so they are likely tempted to over-use it.

Perhaps OpenAI should get back to a good older snapshot and be more careful about what they feed into the daily/weekly LoRA fine-tuning.

But this is all guesswork because they don't reveal much.



I thought RLHF with LoRA is precisely the alignment method.





Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: