Assemblers and compilers are (practically) deterministic. LLMs are not.

chowells · 2025-05-18T18:35:34 1747593334

That's the wrong distinction, and bringing it up causes pointless arguments like are in the replies.

The right distinction is that assemblers and compilers have semantics and an idea of correctness. If your input doesn't lead to a correct program, you can find the problem. You can examine the input and determine whether it is correct. If the input is wrong, it's theoretically possible to find the problem and fix it without ever running the assembler/compiler.

Can you examine a prompt for an LLM and determine whether it's right or wrong without running it through the model? The idea is ludicrous. Prompts cannot be source code. LLMs are fundamentally different from programs that convert source code into machine code.

This is something like "deterministic" in the colloquial sense, but not at all in the technical sense. And that's where these arguments come from. I think it's better to sidestep them and focus on the important part: compilers and assemblers are intended to be predictable in terms of semantics of code. And when they aren't, it's a compiler bug that needs to be fixed, not an input that you should try rephrasing. LLMs are not intended to be predictable at all.

So focus on predictability, not determinism. It might forestall some of these arguments that get lost in the weeds and miss the point entirely.

traverseda · 2025-05-18T12:22:38 1747570958

LLMs are deterministic. So far every vendor is giving them random noise in addition to your prompt though. They don't like have a free will or a soul or anything, you feed them exactly the same tokens exactly the same tokens will come out.

mmoskal · 2025-05-18T19:56:16 1747598176

If you change one letter in the prompt, however insignificant you may think it is, it will change the results in unpredictable ways, even with temperature 0 etc. The same is not true of renaming a variable in a programming language, most refactorings etc.

jnwatson · 2025-05-18T15:08:06 1747580886

Only if you set temperature to 0 or have some way to set the random seed.

vlovich123 · 2025-05-18T15:40:56 1747582856

Locally that’s possible but for multi tenant ones I think there’s other challenges related to batch processing (not in terms of the random seed necessarily but because of other non determinism sources).

codr7 · 2025-05-18T23:13:36 1747610016

That's not how they are being used though, is it?

pjmlp · 2025-05-18T10:05:21 1747562721

Missed the part?

> Most likely we will still need some kind of formalisation tools to tame natural language uncertainties, however most certainly they won't be Python/Rust like

Wowfunhappy · 2025-05-18T10:09:03 1747562943

No, I didn't miss it. I think the fact that LLMs are non deterministic means we'll need a lot more than "some kind of formalization tools", we'll need real programming languages for some applications!

pjmlp · 2025-05-18T10:12:49 1747563169

How deterministic are C compilers at -O3, while compiling exactly the same code across various kinds of vector instructions, and GPUs?

We are already on the baby steps down that path,

https://code.visualstudio.com/docs/copilot/copilot-customiza...

spookie · 2025-05-18T10:52:08 1747565528

Take a look at the following: https://reproduce.debian.net/

Granted, lot's of different compilers and arguments depending on packages. But you need to match this reproducibility in a fancy pants 7GL

pjmlp · 2025-05-18T12:14:20 1747570460

And still its behaviour isn't guaranteeded if the hardware isn't exactly the same as where the binaries were produced.

That is why on high integrity computing all layers are certified, and any tiny change requires a full stack re-certification.

saagarjha · 2025-05-19T08:22:55 1747642975

There is a world of difference between "my code is generated by an LLM where a tiny change in the prompt might produce an entirely different program" and "this CPU doesn't have AVX2".

almostgotcaught · 2025-05-18T11:18:11 1747567091

You moved the goal posts and declared victory - that's not what deterministic means. It means same source, same flags, same output. Under that definition, the actual definition, they're 99.9% deterministic (we strive for 100% but bugs do happen).

pjmlp · 2025-05-18T12:12:26 1747570346

Nope the goal stayed at the same position, people argue for deterministic results while using tools that by definition aren't deterministic unless a big chunck of work is done ensuring that it is indeed.

"It means same source, same flags, same output", it suffices to change the CPU and the Assembly behaviour might not be the same.

almostgotcaught · 2025-05-18T12:52:28 1747572748

Do you like have any idea what you're talking about? Or are you just making it up for internet points? The target is part of the input.

Lemme ELI5

https://github.com/llvm/llvm-project/tree/main/llvm/test/Cod...

You see how this folder has folders for each target? Then within each target folder there are tests (thousands of tests)? Each of those tests is verified deterministically on each commit.

Edit: there's an even more practical way to understand how you're wrong: if what you were saying were true, ccache wouldn't work.

sitkack · 2025-05-18T15:01:05 1747580465

You keep being you, but you also have to admit, not only do you move goal posts, but most of arguments are on dollies, performing elaborate choreographies that would make Merce Cunningham blush.

fulafel · 2025-05-18T15:40:27 1747582827

pjmlp did originally say "compiling exactly the same code across various kinds of vector instructions, and GPUs".

ModernMech · 2025-05-18T15:35:04 1747582504

You have a point, but in making it I think you're undermining your argument.

Yes, it's true that computer systems are nondeterministic if you deconstruct them enough. Because writing code for a nondeterministic machine is fraught with peril, as an industry we've gone to great lengths to move this nondeterminism as far away from programmers as possible. So they can at least pretend their code code is executing in a deterministic manner.

Formal languages are a big part of this, because even though different machines may execute the program differently, at least you and I can agree on the meaning of the program in the context of the language semantics. Then we can at least agree there's a bug and try to fix it.

But LLMs bring nondeterminism right to the programmer's face. They make writing programs so difficult that people are inventing new formalisms, "prompt engineering", to deal with them. Which are kind of like a mix between a protocol and a contract that's not even enforced. People are writing full-on specs to shape the output of LLMs, taking something that's nondeterministic and turning into something more akin to a function, which is deterministic and therefore useful (actually as an aside, this also harkens to language design, where recently languages have been moving toward immutable variables and idempotent functions -- two features that combined help deal with nondeterministic output in programs, thereby making them easier to debug).

I think what's going to happen is the following:

- A lot of people will try to reduce nondeterminism in LLMs through natural language constrained by formalisms (prompt engineering)

- Those formalisms will prove insufficient and people will move to LLMs constrained with formal languages that work with LLMs. Something like SQL queries that can talk to a database.

- Those formal languages will work nicely enough to do simple things like collecting data and making view on them, but they will prove insufficient to build systems with. That's when programming languages and LLMs come back together, full circle.

Ultimately, my feeling is the idea we can program without programming languages is misunderstanding what programming languages are; programming languages are not for communicating with a computer, they are for communicating ideas in an unambiguous way, whether to a computer or a human or an LLM. This is important whether or not a machine exists to execute those programs. After all, programming languages are languages.

And so LLMs cannot and will not replace programming languages, because even if no computers are executing them, programs still need to be written in a programming language. How else are we to communicate what the program does? We can't use English and we know why. And we can't describe the program to the LLM in English for the same reason. The way to describe the program to the LLM is a programming language, so we're stuck building and using them.