Even at temp 0, you might get different answers, depending on your inference eng...

Even at temp 0, you might get different answers, depending on your inference engine. There might be hardware differences, as well as software issues (e.g. vLLM documents this, if you're using batching, you might get different answers depending on where in the batch sequence your query landed).