Means the temperature should be set to 0 (which not every provider supports) so ...

NitpickLawyer · 2025-06-07T04:02:52 1749268972

Even at temp 0, you might get different answers, depending on your inference engine. There might be hardware differences, as well as software issues (e.g. vLLM documents this, if you're using batching, you might get different answers depending on where in the batch sequence your query landed).

weird-eye-issue · 2025-06-07T04:30:09 1749270609

Claude Code already uses a temperature of 0 (just inspect the requests) but it's not deterministic

Not to mention it also performs web searches, web fetching etc which would also make it not deterministic

singhrac · 2025-06-07T21:45:31 1749332731

Production inference is not deterministic because of sharding (i.e. parameter weights on several GPUs on the same machine or MoE), timing-based kernel choices (e.g. torch.backends.cudnn.benchmark), or batched routing in MoEs. Probably best to host a small model yourself.

derwiki · 2025-06-07T04:03:24 1749269004

Two years ago when I was working on this at a startup, setting OAI models’ temp to 0 still didn’t make them deterministic. Has that changed?

afiori · 2025-06-07T07:21:10 1749280870

Do LLMs inference engines have a way to seed their randomness? so tho have reproducible outputs with still some variance if desired?

bavell · 2025-06-07T20:58:59 1749329939

Yes, although it's not always exposed to the end user of LLM providers.

fastball · 2025-06-07T04:09:25 1749269365

I would only care about more deterministic output if I was repeating the same process with the same model, which is not the point of the exercise.

shthed · 2025-06-08T08:12:54 1749370374

This is good: run it n times, have the model review them and pick the best one.