I'm not sure I think the progress is about multimodality. After all, Mistral's approach hasn't involved multimodality, and they've kept up.
An efficient model, then data curation, then post-training. Where things are slowing down is of course necessary to know to be efficient, at least in the short term competition.
An efficient model, then data curation, then post-training. Where things are slowing down is of course necessary to know to be efficient, at least in the short term competition.