Your claim here being that Kenton Varda isn't reading the code he's generating. ...

kiitos · 2025-06-07T16:27:46 1749313666

No, that's not at all my claim, as it's obvious from the commit history that Kenton is reading the code he's generating before committing it.

kentonv · 2025-06-07T16:35:26 1749314126

What do you mean by "one degree removed from 100% pure vibe coding", then? The definition of vibe coding is letting the AI code without review...

kiitos · 2025-06-07T16:45:56 1749314756

> one degree removed

You're letting Claude do your programming for you, and then sweeping up whatever it does afterwards. Bluntly, you're off-loading your cognition to the machine. If that's fine by you then that's fine enough, it just means that the quality of your work becomes a function of your tooling rather than your capabilities.

kentonv · 2025-06-07T20:52:13 1749329533

I don't agree. The AI largely does the boring and obvious parts. I'm still deciding what gets built and how it is designed, which is the interesting part.

tptacek · 2025-06-07T21:45:19 1749332719

It's the same with me, with the added wrinkle of pulling each PR branch down and refactoring things (and, ironically, introducing my own bugs).

kiitos · 2025-06-07T21:47:33 1749332853

> I'm still deciding what gets built and how it is designed, which is the interesting part.

How, exactly? Do you think that you're "deciding what gets built and how it's designed" by iterating on the prompts that you feed to the LLM that generates the code?

Or are you saying that you're somehow able to write the "interesting" code, and can instruct the LLM to generate the "boring and obvious" code that needs to be filled-in to make your interesting code work? (This is certainly not what's indicated by your commit history, but, who knows?)

kentonv · 2025-06-08T00:10:24 1749341424

Did you actually read the commit history?

My prompts specify very precisely what should be implemented. I specified the public API and high-level design upfront. I let the AI come up with its own storage schema initially but then I prompted it very specifically through several improvements (e.g. "denormalize this table into this other table to eliminate a lookup"). I designed the end-to-end encryption scheme and told it in detail how to implement it. I pointed out bugs and explained how to fix them. And so on.

All the thinking happened in those prompts. With the details I provided, combined with the OAuth spec, there was really very little room left for any creativity in the code. It was basically connect-the-dots at that point.

kiitos · 2025-06-08T18:12:21 1749406341

Right, so -- 'you think that you're "deciding what gets built and how it's designed" by iterating on the prompts that you feed to the LLM that generates the code'

> My prompts specify very precisely what should be implemented.

And the precision of your prompt's specifications, has no reliable impact on exactly what code the LLM returns as output.

> With the details I provided, combined with the OAuth spec, there was really very little room left for any creativity in the code. It was basically connect-the-dots at that point.

I truly don't know how you can come to this conclusion, if you have any amount of observed experience with any of the current-gen LLM tools. No amount of prompt engineering gets you a reliable mapping from input query to output code.

> I designed the end-to-end encryption scheme and told it in detail how to implement it. I pointed out bugs and explained how to fix them. And so on.

I guess my response here is that, if you think that this approach to prompt engineering gets you a generated code result that is in any sense equivalent, or even comparable, in terms of quality, to the work that you could produce yourself, as a professional and senior-level software engineer, then, man, we're on different planets. Pointing out bugs and explaining how to fix them in your prompts in no way gets you deterministic, reliable, accurate, high-quality code as output. And actually forget about high-quality, I mean even just bare minimum table-stakes requirements-satisfying stuff.. !

tptacek · 2025-06-08T18:23:51 1749407031

Nobody has claimed to be getting deterministic outputs from LLMs.

kiitos · 2025-06-08T18:57:06 1749409026

> My prompts specify very precisely what should be implemented. I specified the public API and high-level design upfront. I let the AI come up with its own storage schema initially but then I prompted it very specifically through several improvements (e.g. "denormalize this table into this other table to eliminate a lookup"). I designed the end-to-end encryption scheme and told it in detail how to implement it. I pointed out bugs and explained how to fix them. And so on.

OK. Replace "[expected] deterministic output" with whatever term best fits what this block of text is describing, as that's what I'm talking about. The claim is that a sufficiently-precisely-specified prompt can produce reliably-correct code. Which is just clearly not the case, as of today.

tptacek · 2025-06-08T19:25:27 1749410727

I don't even think anybody expects reliably-correct code. They expect code that can be made as reliably as they themselves could make code, with some minimal amount of effort. Which clearly is the case.

kiitos · 2025-06-08T20:10:38 1749413438

Forget about reliably-correct. The code that any current-gen LLM generates, no matter how precise the prompt it's given, is never even close to the quality standards expected of any senior-level engineer, in any organization I've been a part of, at any point in my career. They very much never produce code that is as good as what I can create. If the LLM-generated code you're seeing passes this level of muster, in your view, then that's really a reflection on your situation(s), and 100% not any kind of truth that you can claim as part of a blog post or whatever...

kentonv · 2025-06-08T21:59:02 1749419942

> The code that any current-gen LLM generates, no matter how precise the prompt it's given, is never even close to the quality standards expected of any senior-level engineer, in any organization I've been a part of, at any point in my career.

You are just making assertions here with no evidence.

If you prompt the LLM for code, and then you review the code, identify specific problems, and direct the LLM to fix those problems, and repeat, you can, in fact, end up with production-ready code -- in less time than it would take to write by hand.

Proof: My project. I did this. It worked. It's in production.

It seems like you believe this code is not production-ready because it was produced using an LLM which, you believe, cannot produce production-ready code. This is a cyclic argument.

kiitos · 2025-06-09T18:28:10 1749493690

> If you prompt the LLM for code, and then you review the code, identify specific problems, and direct the LLM to fix those problems, and repeat, you can, in fact, end up with production-ready code

I guess I will concede that this is possible, yes. I've never seen it happen, myself, but it could be the case, at some point, in the future.

> in less time than it would take to write by hand.

This is my point of contention. The process you've described takes ages longer than however much time it would take a competent senior-level engineer to just type the code from first principles. No meaningful project has ever been bottle-necked on how long it takes to type characters into editors.

All of that aside, the claim you're making here is that, speaking as a senior IC, the code that an LLM produces, guided by your prompt inputs, is more or less equivalent to any code that you could produce yourself, even controlling for time spent. Which just doesn't match any of my experiences with any current-gen LLM or agent or workflow or whatever. If your universe is all about glue code, where typing is enemy no. 1, and details don't matter, then fair enough, but please understand that this is not usually the domain of senior-level engineers.

simonw · 2025-06-09T20:41:03 1749501663

"the code that an LLM produces, guided by your prompt inputs, is more or less equivalent to any code that you could produce yourself, even controlling for time spent"

That's been my personal experience over the past 1.5 years. LLMs, prompted and guided by me, write code that I would be proud to produce without them.

kentonv · 2025-06-10T01:09:30 1749517770

I have only claimed that for this particular project it worked really well, and was much faster than writing by hand. This particular project was arguably a best-case scenario: a greenfield project implementing a well-known standard against a well-specified design.

I have tried using AI to make changes to the Cloudflare Workers Runtime -- my usual main project, which I started, and know like the back of my hand, and which incidentally handles over a trillion web requests every day -- and in general in that case I haven't found it saved me much time. (Though I've been a bit surprised by the fact that it can find its way around the code at all, it's a pretty complicated C++ codebase.)

It really depends on the use case.

59nadir · 2025-06-09T08:09:59 1749456599

It's possible kiitos has (or had?) a higher standard in mind for what should constitute a senior/"lead engineer" at Cloudflare and how much they should be constrained by typing as part of implementation.

Out of interest: How much did the entire process take and how much would you estimate it to take without the LLM in the loop?

kentonv · 2025-06-09T13:22:00 1749475320

> It's possible kiitos has (or had?) a higher standard in mind for what should constitute a senior/"lead engineer" at Cloudflare and how much they should be constrained by typing as part of implementation.

See again here, you're implying that I or my code is disappointing somehow, but with no explanation for how except that it was LLM-assisted. I assert that the code is basically as good as if I'd written it by hand, and if you think I'm just not a competent engineer, like, feel free to Google me.

It's not the typing itself that constrains, it's the detailed but non-essential decision-making. Every line of code requires making several decisions, like naming variables, deciding basic structure, etc. Many of these fine-grained decisions are obvious or don't matter, but it's still mentally taxing, which is why nobody can write code as fast as they can type even when the code is straightforward. LLMs can basically fill in a bunch of those details for you, and reviewing the decisions -- especially the fine-grained ones that don't matter -- is a lot faster than making them.

> How much did the entire process take and how much would you estimate it to take without the LLM in the loop?

I spent about five days mostly focused on prompting the LLM (although I always have many things interrupting me throughout the day, so I wasn't 100% focused). I estimate it would have taken me 2x-5x as long to do by hand, but it's of course hard to say for sure.

59nadir · 2025-06-10T08:16:40 1749543400

> See again here, you're implying that I or my code is disappointing somehow, but with no explanation for how except that it was LLM-assisted. I assert that the code is basically as good as if I'd written it by hand, and if you think I'm just not a competent engineer, like, feel free to Google me.

I think you're reading a bit too deeply into what I wrote; I explained what I interpreted kiitos' posts as essentially saying. I realize that you've probably had to deal with a lot of people being skeptical to the point of "throwing shade", as it were, so I understand the defensive posture. I am skeptical, but the reason I'm asking questions (alongside the previous bit) is because I'm actually curious about your experiment.

> It's not the typing itself that constrains, it's the detailed but non-essential decision-making. Every line of code requires making several decisions, like naming variables, deciding basic structure, etc. Many of these fine-grained decisions are obvious or don't matter, but it's still mentally taxing, which is why nobody can write code as fast as they can type even when the code is straightforward. LLMs can basically fill in a bunch of those details for you, and reviewing the decisions -- especially the fine-grained ones that don't matter -- is a lot faster than making them.

In your estimation, what is your mental code coverage of the code you ended up with? Do you feel like you have a complete mapping of it, i.e. you could get an external request for change and map it quickly to where it needs to be made and why exactly there?

kentonv · 2025-06-10T13:50:02 1749563402

> In your estimation, what is your mental code coverage of the code you ended up with? Do you feel like you have a complete mapping of it, i.e. you could get an external request for change and map it quickly to where it needs to be made and why exactly there?

I know the code structure about as well as if I had written it.

Honestly the code structure is not very complicated. It flows pretty naturally from the interface spec in the readme, and I'd expect anyone who knows OAuth could find their way around pretty easily.

But yes, as part of prompting improvements to the code, I had to fully understand the implementation. My prompts are entirely based on reading the code and deciding what needed to be changed -- not based on any sort of black-box testing of the code (which would be "vibe coding").

bdangubic · 2025-06-08T22:09:17 1749420557

100% this. I have same proof… In productions… across 30+ services… hourly…

tptacek · 2025-06-08T22:42:58 1749422578

The genetic fallacy is a hell of a drug.

ramchip · 2025-06-08T01:02:28 1749344548

Personally, I spend _more_ time thinking with Claude. I can focus on the design decisions while it does the mechanical work of turning that into code.

Sometimes I give the agent a vague design ("make XYZ configurable") and it implements it the wrong way, so I'll tell it to do it again with more precise instructions ("use a config file instead of a CLI argument"). The best thing is you can tell it after it wrote 500 lines of code and updated all the tests, and its feelings won't be hurt one bit :)

It can be useful as a research tool too, for instance I was porting a library to a new language, and I told the agent to 1) find all the core types and 2) for each type, run a subtask to compare the implementation in each language and write a markdown file that summarizes the differences with some code samples. 20 min later I had a neat collection of reports that I could refer to while designing the API in the new language.

steveklabnik · 2025-06-08T16:36:55 1749400615

I was recently considering a refactoring in a work codebase. I had an interesting discussion with Claude about the tradeoffs, then had it show me what the code would look like after the refactor, both in a very simple case, and in one of the most complex cases. All of this informed what path I ended up taking, but especially the real-world examples meant this was a much better informed decision than just "hmm, yeah seems like it would be a lot of work but also probably worth it."

rossjudson · 2025-06-09T23:48:53 1749512933

You ever get the feeling someone didn't look up Kenton Varda before criticizing the code he's generating?

I guarantee you that Kenton Varda's generators generate more code than any other code generators that aren't compilers. ;)