Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The goal isnt to assess the LLM capability at solving any of those problems. The point isnt how good they are at block world puzzles.

The point is to construct non-circular ways of quantifying model performance in reasoning. That the LLM has access to prior exemplars of any given problem is exactly the issue in establishing performance in reasoning, over historical synthesis.



How are these problems more interesting than simple arithmetic or algorithmic problems?


Towers of Hanoi IS an algorithmic problem. It is a high-school/college level problem when designing algorithms, probably kid level when trying to solve intuitively, heuristically or via brute force for few disks (i.e. like when playing Mass Effect 1 or similar games that embed it as a minigame*).

* https://www.youtube.com/watch?v=1vTBVyhX7n4


The problems themselves aren’t particularly interesting, I suppose. The interesting part is how the complexity of each problem scales as a function of the number of inputs (e.g. the number of disks in the tower of Hanoi).




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: