Hacker Newsnew | past | comments | ask | show | jobs | submit | liszper's commentslogin

With all due respect, you sound like someone who is just getting familiar with these tools. 100 more hours spent with AI coding and you will be much more productive. Coding with AI is a slightly different skill from coding, similar how managing software engineers is different from writing software.


liszper:

> most SWE folks still have no idea how big the difference is between the coding agents they tried a year ago and declared as useless and chatgpt 5 paired with Codex or Cursor today

Also liszper: oh, you tried the current approach and don’t agree with me? Well you just don’t know what you are doing.


Lol, what is up with everyone assuming there's no learning curve to these things? If you applied this argument to literally any other tool you would be laughed at, for good reason.


Probably because "there's no learning curve they are just magic tools" is how they are marketed and how our managers are expecting them to work


Sure, but people are allowed to have their own opinions too.


Yes, exactly. Learning new things is hard. Personally it took me about 200 hours to get started, and since then ~2500 hours to get familiar with the advanced techniques, and now I'm very happy with the results, managing extremely large codebases with LLM in production.

For context before that I had ~15 years of experience coding the traditional way.


Has anyone else noticed the extreme dichotomy of developers using AI agents? Either AI agents essentially don't work, or they are apparently running legions of agents to produce some nebulous gigantic estate.

I think the crucial difference is that I do actually see evidence (ie the codebase) posted sometimes for the former, the latter could well be entirely mythos -- a 24 day old account evangelizing for the legion of agents story does kind of fit the theme.


You should write a blog post about your learnings. If you could even give some high level highlights here that’d be really helpful.


How many users is production and how large is extremely large.


200k DAU, 7 million registered, ~50 microservices, large monorepo


You have 50 microservices for 200k daily users?

Let me guess this has something to do with AI?


No, It has something to do with experience. The system is highly integrated to other platforms and have to stay afloat during burst loads.


.. what is this thing and can we see it?


you can OSINT me pretty easily, not going to post it here for the sake of anonymity against crawlers who train models on our conversations. today's HN comments are tomorrow's coding LLMs


Funnily enough the same kind of approach you get from Lisp advocates and the more annoying faction of Linux advocacy (which isn't as prevalent these days, it seems)


> the same kind of approach you get from Lisp

In what way? Lisp (Common Lisp) is the most stable and unchanging language out there. If you learned it anytime after the late 80s, you still know it, and will know it until the end of time. Meanwhile, here, we hear that "a year ago" is so much time that everything changed (for the better, of course).

Or is it about needing some serious time investment to get comfortable with Lisp? Even then, once you do spend enough time that s-exprs stop being a problem, that's it; there's nothing else to be getting comfortable with, and certainly, you won't need to relearn any of that a year into the future.

I don't think AI coding and Lisp are comparable, even considering just the tone of messages on the topic (as far as I can see, "smug lisp weenies" are a thing of the ancient past).


I'm also a lisper, yes.


> Be kind. Don't be snarky. Converse curiously; don't cross-examine. Edit out swipes.

https://news.ycombinator.com/newsguidelines.html


I'm starting to kind of dig C.C. but you're right, it definitely feels like a very confident, very ambitious high schooler level developer with infinite energy. You really have to give it very small tasks and be constantly steering its direction. At the end of the day, I'm not sure I'm really saving that much time coaching Claude to do the job right vs. just writing the code myself, but it's definitely a neat trick.

The difference from an actual junior developer, of course, is that the human junior developer learns from his mistakes and gets better, but Claude seems to be stuck at the level of expertise of its model, and you have to wait for the model to improve before Claude improves.


The thing I am calling BS on is that there's much productivity gain in giving it very small tasks and constantly steering its direction. For 80% of code, I'm faster than it if that's what I have to do. For debugging? For telling it to integrate a new tool? Port my legacy build system to something better? It's great at that, removes some important barriers to entry.


Bingo. All my experience is on Linux, and I've never written anything for Windows. So recently when I needed to port a small C program to Windows, I told ChatGPT "Hey, port this to Windows for me". I wouldn't trust the result, I'd rewrite it myself, but it let me find out which Win32 API functions I'd be calling, and why I'd be calling them, faster than searching MSDN would have done.


I think it has more to do with the kind of software I write and their requirements then it has to do with spending more time with this current tool. For some things it's great, but it's been a net productivity loss for me on my core coding responsibilities.


ah, they are holding it wrong.

I am always so skeptical of this style of response. Because if it takes hundreds of hours to learn to use something, how can it really be the silver bullet everyone was claiming earlier? Surely they were all in the midst of the 100 hours. And what else could we do if we spent 100 hours learning something? It's a lot of time, a huge investment, all on faith that things will get better.


How many hours do you have mastering git or your IDE or your library of choice for UX?


Then, it's the job of someone else to use these tools, not developers


I agree with your point. I think this is the reason why most developers still don't get it, because AI coding ultimately requires a "higher level" methodology.


"Hacker culture never took root in the 'AI' gold rush because the LLM 'coders' saw themselves not as hackers and explorers, but as temporarily understaffed middle-managers." [0]

This, this is you. This is the entire charade. It seems poetic somehow.

[0]https://news.ycombinator.com/item?id=45123094


I see myself as a hacker.


By your own exposition, you aren’t a hacker.


hackers can also cook and not become a chef


most SWE folks still have no idea how big the difference is between the coding agents they tried a year ago and declared as useless and chatgpt 5 paired with Codex or Cursor today

thanks for the article, it's a good one


> most SWE folks still have no idea how big the difference is between the coding agents they tried a year ago and declared as useless and chatgpt 5 paired with Codex or Cursor today

yes, just as was said each and every previous time OpenAI/anthropic shit out a new model

"now it doesn't suck!"


Each and every new model expands the scope of what you can do. You notice that, get elated when things that didn’t work start working, then three weeks later the honeymoon period is over and you notice the remaining limits.

The hedonic treadmill ensures it feels the same way each time.

But that doesn’t mean the models aren’t improving, nor that the scope isn’t expanding. If you compare today’s tools to those a year ago, the difference is stark.


She is choosing GPT5 as the good example? Maybe Claude, maybe..


I think most SWEs do have a good idea where I work.

They know that its a significant, but not revolutionary improvement.

If you supervise and manage your agents closely on well scoped (small) tasks they are pretty handy.

If you need a prototype and don't care about code quality or maintenance, they are great.

Anyone claiming 2x, 5x, 10x etc is absolutely kidding themselves for any non-trivial software.


I've found a pretty good speed up just letting Claude Code run with a custom prompt to gather the context (relevant files, types, etc..) for the task then having it put together a document with that context.

It takes all of five minutes to have it run and at the end I can review it, if it's small ask it to execute, and if it actually requires me to work it myself well now I have a reference with line numbers, some comments on how the system appears to work, what the intent is, areas of interest, etc..

I also rely heavily on the sequential thinking MCP server to give it more structure.

Edit:

I will say because I think it's important I've been a senior dev for a while now, a lot of my job _is_ reviewing other people's pull requests. I don't find it hard or tedious at all.

Honestly it's a lot easier to review a few small "PRs" as the agent works than some of the giant PRs I'd get from team members before.


> I've been a senior dev for a while now, a lot of my job _is_ reviewing other people's pull requests

I kind of hate that I'm saying this, but I'm sort of similar and one thing I really like is having zero guilt about trashing the LLM's code. So often people are submitting something and the code is OK but just pervasively not quite how I like it. Some staff will engage in micro arguments about things rather than just doing them how I want and it's just tiring. Then LLMs are really good at explaining why they did stuff (or simulating that) as well. LLMs will enthusiastically redo something and then help adjust their own AGENTS.md file to align better in the future.


> If you supervise and manage your agents closely on well scoped (small) tasks they are pretty handy

Compared to just doing it yourself though?

Imagine having to micromanage a junior developer like this to get good results

Ridiculous tbh


if the benefit is less than 2x then we're talking about AI assisted coding as being a very, very expensive IntelliSense. 1.x improvement just isn't much. My mind goes back to that study showing engineers claimed a 20% improvement and measured 20% reduction in productivity -- this is all encouraging me to just keep using traditional tools.


The only AI-assisted software work I've seen actually have a benefit is the way my coworker use Supermaven, where it's basically Intellisense but suggesting filling in the function parameters for you as well. He'll type `MergeEx` and it will not just suggest `MergeExample(` as Intellisense would have done, but also suggest `MergeExample(oldExample, newExample, mergeOptions)` based on the variable names in scope at the moment and which ones line up with the types. Then he presses Tab and moves on, saving 10-15 seconds of typing. Repeat that multiple times through the day and it might be a 10% improvement, with no time lost on fiddling with prompts to get the AI to correct its mistakes. (Here, if the suggestion is wrong, he just ignores it and keeps typing, and the second he types a character that wasn't the next one in the suggestion it goes away and a new suggestion might be calculated, but the cognitive load in ignoring the incorrect suggestion is minimal).


I've found it to be insanely productive when doing framework-based web development (currently working with Django), I would say it's an easy 5-10x improvement in productivity there, but I still need to keep a somewhat close eye on it. It's not nearly as productive in my home grown stuff, it can be kind of annoying actually.


I'd argue this just proves my point.


It's true that I haven't been a hardcore agent-army vibe coder, I just try the popular ones once in a while in a naive way (isn't it the point of these tools, to have little friction ?), claude code for example. And it's cool ! But imperfect, and as this article attests, there's a lot of mental overhead to even have a shot at getting a decent output. And even if it's decent, it still needs to be reviewed and could include logical flaws.

I'd rather use it the other way, I'm the one in charge, and the AI reviews any logical flaw or things that I would have missed. I don't even have to think about context window since it'll only look at my new code logic.

So yeah, 3 years after the first ChatGPT and Copilot, I don't feel huge changes regarding "automated" AI programming, and I don't have any AI tool in my IDE, I pefer to have a chat using their website, to brainstorm, or occasionally find a solution to something I'm stuck on.


I use agents for coding small stuff at work almost every day. I would say there has been some improvement compared to a year ago but it’s not any sort of step change. They still are only able to complete simple “intern-level” tasks around 50% of the time. Which is helpful but not revolutionary.


I still use Claude Code and Cursor and tbh still run into a lot of the same issues. Hallucinating code, hallucinating requirements, even when scoped to a very simple "make this small change".

It's good enough that it helps, particularly in areas or languages that I'm unfamiliar with. But I'm constantly fighting with it.


Last week I wanted to generate some test data for some unit tests for a certain function in a C codebase. It's an audio codec library, so I could have modified the function to dump its inputs to disk and then run the library on any audio file and then hardcoded the input into the unit tests. Instead, I decided I wanted to save a few bytes and wanted to look at generating dummy data dynamically. I wanted to try out Claude for generating the code that would generate the data, so to keep the context manageable I extracted the function and all its dependencies into a self-contained C program (less than 200 lines altogether) and asked it to write a function that would generate dummy data, in C.

Impressively, it recognized the structure of the code and correctly identified it as a component of an audio codec library, and provided a reasonably complete description of many minute details specific to this codec and the work that the function was doing.

Rather less impressively, it decided to ignore my request and write a function that used C++ features throughout, such as type inference and lambdas, or should I say "lambdas" because it was actually just a function-defined-within-a-function that tried to access and mutate variables outside of its own function scope, like we were writing Javascript or something. Even apart from that, the code was rife with the sorts of warnings that even a default invocation of gcc would flag.

I can see why people would be wowed by this on its face. I wouldn't expect any average developer to have such a depth of knowledge and breadth of pattern-matching ability to be able to identify the specific task that this specific function in this specific audio codec was performing.

At the same time, this is clearly not a tool that's suitable for letting loose on a codebase without EXTREME supervision. This was a fresh session (no prior context to confuse it) using a tightly crafted prompt (a small, self-contained C program doing one thing) with a clear goal, and it still required constant handholding.

At the end of the day, I got the code working by editing it manually, but in an honest retrospective I would have to admit that the overall process actually didn't save me any time at all.

Ironically, despite how they're sold, these tools are infinitely better at going from code to English than going the other way around.


I feel this. I've had a few tasks now where in honest retrospect I find myself asking "did that really speed me up". Its a bit demoralising cause not only do you waste time, you have a worse mental model of the resulting code and feel less sense of ownership over the result.

Brainstorming, ideation and small, well defined tasks where I can quickly vet the solution : these feel like the sweet spot for current frontier model capabilities.

(Unless you are pumping out some sloppy React SPA that you don't care about anything except get it working as fast as possible - fine, get Claude code to one shot it)


There’s been a lot of noise about Claude performance degradation, and the current best option is probably Codex, but this still surprises me. It sounds like it succeeded on the hard part, then stumbled on the easy bit.

Just two questions, if you don’t mind satisfying my curiosity.

- Did you tell it to write C? Or better yet, what was the prompt? You can use Claude --resume to easily find that.

- Which model? (Sooner or Opus)? Though I’d have expected either one to work.


Sooner -> Sonnet


> Ironically, despite how they're sold, these tools are infinitely better at going from code to English than going the other way around.

Yes. Decently useful (and reasonably safe) to red team yourself with. But extremely easy to red queen yourself otherwise.


I tried again recently and I see absolutely no difference. If there's been some improvement, it's very subtle.

There's a big difference with their benchmarks and real world coding.


Funky how you can just generate these things in 2 minutes


lmao


Really nice, thanks for sharing!


Agentic AI is when an LLM uses tools. This is a minimal but complete example of that.


Bro, you gave it all the tools. I wouldn't call this minimal. Throw it in a docker container and call it good. :)


Thank you, fixed that!

curl was cheating yes, might go zero dependencies in the future.

Working on minimal local training/inference too. Goal of these experiments is to have something completely independent.


Yes, it is fun to create small but mighty executables. I intentionally kept everything barebones and hardcoded, because I assumed if you are interested in using Agent-C, you will fork it an make it your own, add whatever is important to you.

This is a demonstration that AI agents can be 4KB and fun.


You should still not compromise on error reporting, for example. The user would not know if a failure occurs because it can't create the /tmp file, or the URL is wrong, or DNS failed, or the response was unexpected etc. These are things you can lose hours to troubleshooting and thus I would not fork it and make my own if I have to add all these things.

I also disagree that it's small but mighty, you popen curl that does the core task. I am not sure, but a bash script might come out even smaller (in particular if you compress it and make it self expanding)


Updated to CC0!


That’s great! Really cool project too.


Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: