The paper shows reasoning is better than no reasoning, reasoning needs more tokens to work for simple tasks, and that models get confused when things get too complicated. Nothing interesting, on the level of what an undergrad would write for a side project. If it wasn’t “from apple” no one would be mentioning it.
They cite https://arxiv.org/abs/2503.23829 which is interesting though. If you have lots of tokens to burn, just try the task lots of times. Can find better solutions than reasoning would on its first try. Only tested on small models though.