The system prompt in this experiment limits the solution to always spell out the concrete moves verbally. A human solving the Tower of Hanoi gives up around N=4 and goes off to invent a recursive solution instead. Prompted differently, the LLM would solve these puzzles just fine.
Here is my complete review/analysis of the paper: https://www.linkedin.com/pulse/art-abstraction-human-advanta...
edit: fixed typo