Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Comparing "turn photo into LaTeX" to "translate theorems into Lean" is like comparing a child's watercolor drawing to the Mona Lisa.


… no? After the LaTeX output, I told stock GPT4o that the answer was "all even integers", and asked for the statement in Lean. I had to make two changes to its output (both of which were compile-time errors, not misformalisations), and it gave me the formalisation of the difficult direction of the problem.

Both changes were trivial: it had one incorrect (but unnecessary) import, and it used the syntax from Lean 3 instead of Lean 4 in one lambda definition. A system that was trained harder on Lean would not make those mistakes.

The one actual error it made was in not proposing that the other direction of the "if and only if" is required. Again, I am quite confident that this formalisation failure mode is not hard to solve in a system that is, like, actually trained to do this.

Obviously formalising problems that a working mathematicican solves is dramatically harder than formalising IMO problems, and is presumably way ahead of the state of the art.


> I am quite confident that this formalisation failure mode is not hard to solve in a system that is, like, actually trained to do this.

Why?


This is really not the kind of problem LLMs are bad at! But since you insist, given the LaTeX, Claude 3.5 Sonnet correctly stated the theorem in full while inventing notation for the floor operation (it did correctly note unprompted what the right function was and how to obtain it from mathlib, but it incorrectly attempted to define syntax sugar for it).


The hard part isn't getting the formalisation right sometimes, it's getting it right reliably (and unlike mistakes in the formal part, there's no way for the system to check itself in that part).

IDK, even for translation between languages outside of mathematics, missing small qualifiers that change the whole meaning of the sentence is a worrying failure mode I've seen, and with mathematical problems there are a lot more cases like that.


I think that's exagerating a bit. If you are familiar with both Lean and LaTeX then I think transcribing these problems to Lean only takes about twice as long as transcribing them to LaTeX.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: