Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really would like an answer to this.

My CTO is currently working on the ability to run several dockerised versions of the codebase in parallel for this kind of flow.

I’m here wondering how anyone could work on several tasks at once at a speed where they can read, review and iterate the output of one LLM in the time it takes for another LLM to spit an answer for a different task.

Like, are we just asking things as fast as possible and hoping for a good solution unchecked? Are others able to context switch on every prompt without a reduction in quality? Why are people tackling the problem of prompting at scale as if the bottleneck was token output rather than human reading and reasoning?

If this was a random vibecoding influencer I’d get it, but I see professionals trying this workflow and it makes me wonder what I’m missing.



I was going to say that this is how genetic algorithms work, but there is still too much human in the loop.

Maybe code husbandry?


Code Husbandry is a good term for what I've been thinking about how to implement. I hope you don't mind if I steal it. Think automated "mini agents", each with a defined set of tools and tasks, responding to specific triggers.

Imagine one agent just does docstrings - on commit, build an AST, branch, write/update comments accordingly, push and create a merge request with a standard report template.

Each of these mini-agents has a defined scope and operates in its own environment, and can be customized/trained as such. They just run continuously on the codebase based on their rules and triggers.

The idea is that all these changes bubble up to the developer for approval, just maybe after a few rounds of LLM iteration. The hope is that small models can be leveraged to a higher quality of output and operate in an asynchronous manner.


My assumption lately is that this workflow is literally just “it works, so merge”. Running multiple in parallel does not allownfor inspection of the code just for testing functional requirements at the end


Hmm, I haven’t managed to make it work yet, and I’ve tried. The best I can manage is three completely separate projects, and they all get only divided attention (which is often good enough these days).


Do you feel you get a faster/better end result than focusing on a single task at a time?

I can’t help but feel it’s like texting and driving, where people are overvaluing their ability to function with reduced focus. But obviously I have zero data to back that up.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: