ChatGPT is laughably terrible at double entry accounting. A few weeks ago I was ...

andai · 2025-04-10T11:10:09 1744283409

Using a system based on randomness for a process that must occur deterministically is probably the wrong solution.

I'm running into similar issues trying to use LLMs for logic and reasoning.

They can do it (surprisingly well, once you disable the friendliness that prevents it), but you get a different random subset of correct answers every time.

I don't know if setting temperature to 0 would help. You'd get the same output every time, but it would be the same incomplete / wrong output.

Probably a better solution is a multi phase thing, where you generate a bunch of outputs and then collect and filter them.

jjmaestro · 2025-04-10T11:37:46 1744285066

> They can do it (surprisingly well, once you disable the friendliness that prevents it) ...

Interesting! :D Do you mind sharing the prompt(s) that you use to do that?

Thanks!!

andai · 2025-04-10T16:56:33 1744304193

You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, demand clarification. Your goal is not to help me feel good — it’s to help me think better.

Keep your responses short and to the point. Use the Socratic method when appropriate.

When enumerating assumptions, put them in a numbered list. Make the list items very short: full sentences not needed there.

---

I was trying to clone Gemini's "thinking", which I often found more useful than its actual output! I failed, but the result is interesting, and somewhat useful.

GPT 4o came up with the prompt. I was surprised by "never use friendly language", until I realized that avoiding hurting the user's feelings would prevent the model from telling the truth. So it seems to be necessary...

It's quite unpleasant to interact with, though. Gemini solves this problem by doing the "thinking" in a hidden box, and then presenting it to the user in soft language.

idonotknowwhy · 2025-04-10T23:59:32 1744329572

Have you tried Deepseek-R1?

I run it locally and read the raw thought process, find it very useful (can be ruthless at times) seeing this before it tags on the friendliness.

Then you can see it's planning process to tag on the warmth/friendliness "but the user seems proud of... so I need to acknowledge..."

I don't think Gemini's "thoughts" are the raw CoT process, they're summarized / cleaned up by a small model before returned to you (same as OpenAI models).

andai · 2025-04-11T11:42:43 1744371763

That's fascinating. I've been trying to get other models to mimick Gemini 2.5 Pro's thought process, but even with examples, they don't do it very well. Which surprised me, because I think even the original (no RLHF) GPT-3 was pretty good at following formats like that! But maybe there's not enough training data in that format for it to "click".

It does seem similar in structure to Gemini 2.0's output format with the nested bullets though, so I have to assume they trained on synthetic examples.

andai · 2025-04-11T16:34:55 1744389295

Edit: It has a name! https://arxiv.org/abs/2203.11171

Suppafly · 2025-04-10T18:07:50 1744308470

>Pointing it out didn’t help either, it just apologized and went on to make the same mistake in a different way.

They really should modify it to take out that whole loop where it apologizes, claims to recognize its mistake, and then continues to make the mistake that it claimed to recognize.

vintermann · 2025-04-10T15:48:06 1744300086

You'd think accounting students would catch on.

davedx · 2025-04-11T08:04:18 1744358658

> me, just submitted my taxes for last year with a lot of help from ChatGPT: :eyes: