This period of model scaling at all cost is going to be a major black eye on the industry in a couple years. We already know that language models are few shot learners at inference time, and yet OpenAI seems to be happy throwing petaflops of compute training models the slow way.
The question is how can you use in-context learning to optimize the model weights. It’s a fun math problem and it certainly won’t take a billion dollar super computer to solve it.
The question is how can you use in-context learning to optimize the model weights. It’s a fun math problem and it certainly won’t take a billion dollar super computer to solve it.