Larger llms do pretty well with this. Smaller ones don't.

sebzim4500 · 2025-02-21T20:11:41 1740168701

Large ones do better than small ones but still do worse than I would have expected before I tested them. E.g. `o1` doesn't know things which are repeated several times on wikipedia.

fooker · 2025-02-21T20:21:25 1740169285

o1 is not too large, and the emphasis is on reasoning rather than memorization.

Try the largest llama models, and phrase your prompt like a sentence to be completed instead of you asking a question.