As expected, I've always believed that with the right data allowing the LLM to be trained to imitate reasoning, it's possible to improve its performance. However, this is still pattern matching, and I suspect that this approach may not be very effective for creating true generalization. As a result, once o1 becomes generally available, we will likely notice the persistent hallucinations and faulty reasoning, especially when the problem is sufficiently new or complex, beyond the "reasoning programs" or "reasoning patterns" the model learned during the reinforcement learning phase.
https://www.lycee.ai/blog/openai-o1-release-agi-reasoning
So basically it's a kind of overfitting with pattern matching features? This doesn't undermine the power of LLMs but it is great to study their limitations.