I am fairly optimistic about LLMs as a human math -> theorem-prover translator, ...

QuadmasterXLII · 2024-12-23T12:49:47 1734958187

To give a sense if scale: It’s not that o3 failed to solve that red blue rectangle problem once: o3 spent thousands of gpu hours putting out text about that problem, creating by my math about a million pages of text, and did not find the answer anywhere in those pages. For other problems it did find the answer around the million page mark, as at the ~$3000 per problem spend setting the score was still slowly creeping up.

josh-sematic · 2024-12-23T13:02:41 1734958961

If the trajectory of the past two years is any guide, things that can be done at great compute expense now will rapidly become possible for a fraction of the cost.

asadotzler · 2024-12-23T14:34:28 1734964468

The trajectory is not a guide, unless you count the recent plateauing.

asddubs · 2024-12-23T12:04:42 1734955482

it can take my math and point out a step I missed and then show me the correct procedure but still get the wrong result because it can't reliably multiply 2-digit numbers

fifilura · 2024-12-23T12:15:58 1734956158

Better than an average human then.

actionfromafar · 2024-12-23T12:28:41 1734956921

Different than an average human.

watt · 2024-12-23T22:02:58 1734991378

it's a "language" model (LLM), not a "math" model. when it is generating your answer, predicting and outputing a word after word it is _not_ multiplying your numbers internally.

asddubs · 2024-12-24T02:55:49 1735008949

Yes, I know. It's just kind of interesting how it can make inferences about complicated things but not get multiplications correct that would almost definitely have been in its training set many times (two digit by two digit)

aithrowawaycomm · 2024-12-23T18:24:23 1734978263

Just a comment: the example o1 got wrong was actually underspecified: https://anokas.substack.com/p/o3-and-arc-agi-the-unsolved-ta...

Which is actually a problem I have with ARC (and IQ tests more generally): it is computationally cheaper to go from ARC transformation rule -> ARC problem than it is the other way around. But this means it’s pretty easy to generate ARC problems with non-unique solutions.