Also, you know who did measure every angle to make sure it was correct? The engineers who put together the initial design. They sure as hell took their time getting every detail of the design right before it ever made it to the assembly line.
You misconstrue the analogy. The robot isn’t equivalent to the code in this analogy. It’s the thing that generates the code.
The robot operates deterministically, it has a fixed input and a fixed output. This is what makes it reliable.
Your “AI coder” is nothing like that. It’s non deterministic on its best day, and it gets everything thrown at it so even more of a coin toss. This seriously undermines any expectation of reliability.
The guy’s comparison shows a lack of understanding of either of the systems.
I totally understand that inversion but I think it's a bad analogy.
Industrial automation works by taking a rigorously specified designs developed by engineers and combining it with rigorous quality control processes to ensure the inputs and outputs remains within tolerances. You first have to have a rigorous spec, then you can design a process for manufacturing a lot of widgets while checking 1 out of every 100 of them for their tolerances.
You can only get away with not measuring a given angle on widget #13525 because you're producing many copies of exactly the same thing and you measured that angle on widget #13500 and widget #13400 and so on and the variance in your sampled widgets is within the tolerances specified by the engineer who designed the widget.
There's no equivalent to the design stage or to the QC stage in the vibe-coding process advocated for by the person quoted above.
I don't know what you mean with "the code it creates is deterministic" but the process an LLM uses to generate code based on an input is definitely not entirely deterministic.
To put it simply, the chances that an LLM will output the same result every time given the same input is low. The LLM does not operate deterministically, unlike the manufacturing robot who will output the same door panel every single time. Or as ChatGPT put it:
> The likelihood of an LLM like ChatGPT generating the exact same code for the same prompt multiple times is generally low.
For any given seed value, the output of an LLM will be identical- it is deterministic. You can try this at home with Llama.cpp by specifying a seed value when you load a LLM, and then seeing that for a given input the output will always be the same. Of course there may be some exceptions (cosmic ray bit flips). Also, if you are only using online models, you can't set the seed value, plus there are multiple models, so multiple seeds. In summary, LLMs are deterministic.
> the process an LLM uses to generate code based on an input is definitely not entirely deterministic
Technically correct is the least useful kind of correct when it's wrong in practice. And in practice the process AI coding tools use to generate code is not deterministic which is what matters. To make matters worse in the comparison with a manufacturing robot, even the input is never the same. While a robot get the exact command for a specific motion and the exact same piece of sheet metal, in the same position, a coding AI is asked to work with varied inputs and on varied pieces of code.
Even stamping metal could be called "non-deterministic" since there are guaranteed variations, just within determined tolerances. Does anyone define tolerances for generated code?
That's why the comparison shows a lack of understanding of either of the systems.
I don't really understand your point.
An LLM is loaded with a seed value, which is a number. The number may be chosen through some pseudo- or random process, or specified manually. For any given seed value, say 80085, the LLM will always and exactly generate the same tokens. It is not like stamped sheet metal, because it is digital information not matter. Say you load up R1, and give it a seed value of 80085, then say "hi" to the model. The model will output the exact same response, to the bit, same letters, same words, same punctuation, same order. Deterministic.
There is no way you can say that an LLM is non-deterministic, because that would be WRONG.
First you're assuming a brand new conversation: no context. Second you're assuming a local-first LLM because a remote one could change behavior at any time. Third, the way the input is expressed is inexact, so minor differences in input can have an effect. Fourth, if the data to be operated on has changed you will be using new parts of the model that were never previously used.
But I understand how nuance is not as exciting as using the word WRONG in all caps.
Arguing with "people" on the internet...
Nuance is definitely a word of the year, and if you look at many models you can actually see it's high probability.
Addressing your comment, there was no assumption or indication on my part that determinism only applies to a new "conversation". Any interactions with any LLM are deterministic, same conversation, for any seed value. Yes, I'm talking about local systems, because how are you going to know what is going on on a remote system?
On a local system, a local LLM, if the input is expressed in the same way, the output will be generated in the same way, for all of the token context and so on.
That means, for a seed value, after "hi", the model may say "hello", and then the human's response may be "how ya doin'", and then the model would say "so so , how ya doin?", and every single time, if the human or agent inputs the same tokens, the model will output the same tokens, for a given seed value. This is not really up for question, or in doubt or really anything to disagree about. Am I not being clear? You can ask your local LLM or remote LLM and they will certainly confirm that the process by which a language model generates is deterministic, by definition. Same input means same output, again I must mention that the exception is hardware bit flips, such as those caused by cosmic rays, and that's just to emphasize how very deterministic LLMs are. Of course, as you may know, online providers stage and mix LLMs, so for sure you are not going to be able to know that you are wrong by playing with chatgpt, grok/q, gemini, or whatever other only LLMs you are familiar with. If you have a system capable of offline or non-remote inference, you can see for yourself that you are wrong when you say that LLMs are non-deterministic.
I feel this is technically correct but intentionally cheating. no one - including the model creators - expects that to be the interface; it undermines they entire value proposition of using an LLM in the first place if I need to engineer the inputs to ensure reproducability. I'd love to hear some real world scenarios that do this where it wouldn't be simpler to NOT use AI.
When should a model's output be deterministic?
When should a model's output be non-deterministic?
When many humans interact with the same model, then maybe the model should try different seed values, and make measurements.
When model interaction is limited to a single human, then maybe the model should try different seed values, and make measurements.
An entire generation of devs, who grew up using unaudited, unverified, unknown license code. And which at a moments notice, can be sold to a threat actor.
And I've seen devs try to add packages to the project without even considering the source. Using forks of forks of forks, without considering the root project. Or examing if it's just a private fork, or what is most active and updated.
If you don't care about that code, why care about AI code? Or even your own?
After putting off learning JS for a decade, I finally bit the bullet since I can talk to an LLM about it while going through the slog of getting a mental model up and running.
After a month, I can say that the inmates run that whole ecosystem, from the language spec, to the interpreter, to packaging. And worse, the tools for everyone else have to cater to them.
I can see why someone who has never had a stable foundation to build a project on would view vibe coding as a good idea. When you're working in an ecosystem where any project can break at any time because some dependency pushed a breaking minor version bundled with a security fix for a catastrophic exploit, rolling the LLM gacha to see if it can get it working isn't the worst idea.
since you mention JS specifically, I think it's important to seperate that from the framework ecosystem. I'd suspect that most LLMs don't which is part of the problem. I had a similar experience with Python lately, where the LLM-generated code (once I could get it to run) resulted in code that I would generously evaluate as "Excel VBA Macro quality". It does the task - for now - but I didn't learn much about what production-grade python would look like.
This is an underrated comment. Who's job is it to do the thinking? I suppose it's still the software engineer, which means the job comes down to "code prompt engineer" and "test prompt engineer".
Wild times where a task that used to be described as "good at using google" now gets the title of "Engineer". It was bonkers enough when software devs co-opted the title.
I mean, building applications that are maintainable, will fail gracefully, and keeps costs low, has all the same needs as any classic engineering discipline. You could spend just as much time designing a well thought out CLI as it could take to design a bridge or a sewer system.
Whether people do, or not, is a different question.
I just finished creating a multiplayer online party game using only Claude Code. I didn't edit a single line. However, there is no way someone who doesn't know how to code could get where I am with it.
You have to have an intuition on the sources of a problem. You need to be able to at least glance at the correct and understand when and where the AI is flailing, so you know to backtrack or reframe.
Without that you are as likely to totally mess to you app. Which also means you need to understand source control and when to save and how to test methodically.
I was thinking of that, but asking the right questions and learning the problem domain just a little bit "getting the gist of things" will help a complete newbie to generate code for a complex software.
For example in your case there is the concept of message routing where a message that gets sent to the room is copied to all the participants.
You have timers, animation sheets, events, triggers, etc.
A question that extracts such architectural decisions and relevant pieces of code will help the user understand what they are actually doing and also help debug the problems that arise.
It will of course take them longer, but it is possible to get there.
So I agree, but we aren't at that level of capability yet. Because at some point currently it inevitably hits a wall and you need to dig deeper to push it out of the rut.
Hypothetically, if you codified the architecture as a form of durable meta tests, you might be able to significantly raise the ceiling.
Decomposing to interfaces seems to actually increase architectural entropy instead of decrease it when Claude Code is acting on a code base over a certain size/complexity.
So yes and no. I often just let it work by itself. Towards the very end when I had more of a deadline I would watch and interrupt it when it was putting implementations in places that broke its architecture.
I think only once did I ever give it an instruction that was related to a handful of lines (There certainly were plenty of opportunities, don't get me wrong).
When troubleshooting occasionally I did read the code. There was an issue with player to player matching where it was just kind of stuck and gave it a simpler solution (conceptually, not actual code) that worked for the design constraints.
I did find myself hinting/telling it to do things like centralize the CSS.
It was a really useful exercise in learning. I'm going to write an article about it. My biggest insight is that "good" architecture for an current generation AI is probably different than for humans because of how attention and context works in the models/tools (at least for the current Claude Code). Essentially "out of sight out of mind" creates a dynamic where decomposing code leads to an increase in entropy when a model is working on it.
I need to experiment with other agentic tools to see how their context handling impacts possible scope of work. I extensively use GitHub Copilot, but I control scope, context, and instructions much tighter there.
I hadn't really used hands off automation much in the past because I didn't think the models were at a level that they could handle a significantly sized unit of work. Now they can with large caveats. There also is a clear upper bound with the Claude Code, but that can probably be significantly improved by better context handling.
so if you're an experienced, trained developer you can now add AI as a tool to your skill set? This seems reasonable, but is also a fundamentally different statement that what every. single. executive. is parroting to the echochamber.
I have a strong memory from the start of my career, when I had a job setting up Solaris systems and there was a whispered rumour that one of the senior admins could read core files. To the rest of us, they were just junk that the system created when a process crashed and that we had to find and delete to save disk space. In my mind I thought she could somehow open the files in an editor and "read" them, like something out of the Matrix. We had no idea that you could load them into a debugger which could parse them into something understandable.
I once showed a reasonably experienced infrastructure engineer how to use strace to diagnose some random hangs in an application, and it was like he had seen the face of God.
(Anecdote) Best job I ever had, I walked in and they were like "yeah, we don't have any training or anything like that", but we've got a fully setup lab and a rotating library of literature. <My Boss> "Yeah I'm not going to be around, but here are the office keys" don't blow up the company pretty much.
To be honest, I do find most manuals (man pages) horrible to quickly get information how to do something and here LLMs do shine for me (as long as they don't mix up version numbers).
For man pages, you have to already know what you wants to do and just want information on how exactly to do it. They're not for learning about the domain. You don't read the find manual to learn the basics of filesystems.
I mean the process either works, or it doesn’t. Meaning it either brings in the expected value with acceptable level of defects or it doesn’t.
From a higher up’s perspective what they do is not that different from vibe coding anyway. They pick a direction, provide a high level plan and then see as things take shape, or don’t. If they are unhappy with the progress they shake things up (reorg, firings, hirings, adjusting the terminology about the end goal, making rousing speeches, etc)
They might realise that they bet on the wrong horse when the whole site goes down and nobody inside the company can explain why. Or when the hackers eat their face and there are too many holes to even say which one they did come through. But these things regularly happen already with the current processes too. So it is more of a difference in degree, not kind.
I agree with this completely. I get the impression that a lot of people here think of software development as a craft, which is great for your own learning and development but not relevant from the company's perspective. It just has to work good enough.
Your point about management being vibe coding is spot on. I have hired people to build something and just had to hope that they built it the way I wanted. I honestly feel like AI is better than most of the outsourced code work I do.
One last piece, if anyone does have trouble getting value out of AI tools, I would encourage you to talk to/guide them like you would a junior team member. Actually "discuss" what you're trying to accomplish, lay out a plan, build your tests, and only then start working on the output. Most examples I see of people trying to get AI to do things fail because of poor communication.
> I get the impression that a lot of people here think of software development as a craft, which is great for your own learning and development but not relevant from the company's perspective. It just has to work good enough.
Building the thing may be the primary objective, but you will eventually have to rework what you've built (dependency changes, requirement changes,...). All the craft is for that day, and whatever that goes against that is called technical debt.
You just need to make some tradeoffs between getting the thing out the faster possible and being able to alter it later. It's a spectrum, but instead of discussing it with the engineers, most executive suites (and their manager) wants to give out edicts from high.
> Building the thing may be the primary objective, but you will eventually have to rework what you've built (dependency changes, requirement changes,...). All the craft is for that day, and whatever that goes against that is called technical debt.
This is so good I just wanted to quote it so it showed up in this thread twice. Very well said.
Who's filling that role in this brave new world?