Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think that depends on how optimistic/pessimistic one is on how much more superior the models are going to get. If you're really pessimistic then there isn't all too much one company could do to be 2x or more ahead already. If you're really optimistic then it doesn't matter what anyone is doing today because it's about who finds the next 100x leap.


I don't think it does.

The models have increased greatly in capabilities, but the competitors have simply kept up, and it's not apparently that they won't continue to do that. Furthermore, the breakthroughs-- i.e. fundamentally better models, can happen anywhere where people can and do try out new architectures, and that can happen in surprisingly small places.

It's mostly about culture and being willing to experiment on something which is often very thankless since most radical ideas do not give an improvement.


Which is why getting rid of friction is a good idea.

This is R&D. You want a skunkworks culture where you have the best people in the world trying as many new things as possible, and failure is fine as long as it's interesting failure.

Not a culture where every development requires a permission slip from ten other teams, and/or everyone is worried if they'll still have a job a month from now.


Yes, definitely.


You don't think it does because what you described is only the optimistic take on how much farther LLMs will be able to advance :). The pessimists would look at previous "AI" and point out each new approach quickly rises to prominence and then drastically tapers off in how much more can be squeezed out of it.

I'm somewhere in the middle. I think there is more to squeeze out of LLMs still, but not nearly the kind of growth we had from GPT-2 to multimodal reasoning models. Part of the equation is, as you say, a willingness to experiment on radical ideas. The other part is a willingness to find when the growth curve is slowing rather than bet it will always grow enough for a novel architecture lead to be meaningful.


I'm not sure I think the progress is about multimodality. After all, Mistral's approach hasn't involved multimodality, and they've kept up.

An efficient model, then data curation, then post-training. Where things are slowing down is of course necessary to know to be efficient, at least in the short term competition.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: