If you look at LLM performance on benchmarks, they keep getting better at a fast...

If you look at LLM performance on benchmarks, they keep getting better at a fast rate.[1]

We also now have models of various sizes trained in general matters, and those can now be tuned or fine-tuned to specific domains. The advances in multi-modal AI are also happening very quickly as well. Model specialization, model reflection (chain of thought, OpenAI's new O1 model, etc.) are also undergoing rapid experimentation.

Two demonstrable things that LLMs don't do well currently, are (1) generalize quickly to out-of-distribution examples, (2) catch logic mistakes in questions that look very similar to training data, but are modified. This video talks about both of these things.[2]

I think I-JEPA is a pretty interesting line of work towards solving these problems. I also think that multi-modal AI pushes in a similar direction. We need AI to learn abstractions that are more decoupled from the source format, and we need AI that can reflect and modify its plans and update itself in real time.

All these lines of research and development are more-or-less underway. I think 5-10 years is reasonable for another big advancement in AI capability. We've shown that applying data at scale to simple models works, and now we can experiment with other representations of that data (ie other models or ways to combine LLM inferences).

[1]: https://www.anthropic.com/news/3-5-models-and-computer-use [2]: https://www.youtube.com/watch?v=s7_NlkBwdj8