Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This period of model scaling at all cost is going to be a major black eye on the industry in a couple years. We already know that language models are few shot learners at inference time, and yet OpenAI seems to be happy throwing petaflops of compute training models the slow way.

The question is how can you use in-context learning to optimize the model weights. It’s a fun math problem and it certainly won’t take a billion dollar super computer to solve it.



Seems like this is already being answered:

https://arxiv.org/abs/2407.10930 https://arxiv.org/abs/2006.04439


Not really the first paper is just fine-tuning on synthetic data. The second paper doesn’t optimize the model weights.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: