> LLMs are very useful tools for software development
That's an opinion many disagree with. As a matter of fact, the only limited study up to date showed that LLMs usage decrease productivity for experienced developers by roughly 19%. Let's reserve opinions and link studies.
My anecdotal experience, for example, is that LLMs are such a negative drain on both time and quality that one has to be really early in their career to benefit from their usage.
I wouldn't call myself an 'experienced' developer, but I do find LLMs useful for once-off things, where I can't justify the effort to research and implement my own solution. Two recent examples come to mind:
1. Converting exported data into a suitable import format based on a known schema
2. Creating syntax highlighting rules for language not natively support in a Typst report
Both situations didn't have an existing solution, and while the outputs were not exactly correct, they only needed minor adjustments.
Any other situation, I'd generally prefer to learn how to do the thing, since understanding how to do something can sometimes be as important as the result.
People who get overly excited from every new shining thing also thought that NFTs and Crypto and web3 (whatever the heck it means) are the next coming of Jesus.
If LLM boosters were not so preachy about it, I'd left them off the hook easier. But at the current moment:
- The companies developing LLMs don't have that part of their business profitable nor any path for profitability. You can see it with Anthropic's constantly changing token limits and plans, and with Microsoft and OpenAI not able to reach a deal https://www.ft.com/content/b81d5fb6-26e9-417a-a0cc-6b6689b70...
When the LLM cultists wake up during the bubble pop, I wonder what they are gonna jump on next. The world is running out of hype bandwagons to jump on. Maybe... LLM NFTs?
And other people said the Internet was a fad and a bubble and cell phones were just for people who wanted to look important and solar panels would never work.
History is full of people making wrong predictions in both directions about new technology.
As the most obvious parallel, pets.com went bust in the first dot com bust and so did webvan. Today chewy is successfully replicating pets and ordering groceries online for delivery is common.
We might see 1,000 different AI companies go bankrupt in the next few years, but still have AI be a huge chunk of the economy throughout the 2030s.
So what? A broken clock is still right twice a day.
My point is: are there more false positives or false negatives?
You're also cherry picking. Even with AI there are pretty high profile people saying it's a fad as well as pretty high profile people saying it's going to kill us all.
Look at crypto. WAS it hype? Clearly. Don't tell me the price of bitcoin, tell me how the technology actually changed the world. Tell me how it did even a tenth of what the crypto bros promised.
Look at VR. Don't tell me how much you like the latest Quest, tell me how many people are in the metaverse. Tell me how many people even own a VR system. Tell me how the tech achieved a hundredth of what was promised.
Look at Segway. When was the last time you even saw one? Have you ever even used one? How many people even know what they are?
It doesn't matter if your prediction is right if it is 1 in 1000. The Simpson has a better batting record than that and they aren't even trying. What matters is consistent predictions. Even if you believe this time is different I don't know how you can not understand why people are skeptical. In the last decade we watched people become billionaires off of VR and crypto.
Even if AI is different, people are being glamorized for their experience in crypto and VR as reasons for why they'll be successful in AI. If you believe in AI then why wouldn't you see this as a fox in the chicken coup?
Those people didn't make their billions through technology, they made their billions through hype.
You can believe AI is a bubble and full of hype even if you believe the technology has a lot of uses. It's a lot easier to build hype around a grain of truth than a complete fabrication.
If we put people in a jet with poor training and they crash, that's the pilot's fault, yet if people crash LLMs, that's the LLMs fault. If a study showed 95% of people crashed jets without training, I wouldn't take that as a sign jets are a flawed idea.
As it is, I have no problem with your naysaying, I'm getting results, your disbelief doesn't change that, in fact I find it more amusing than anything.
> If we put people in a jet with poor training and they crash, that's the pilot's fault
Yet if you make a plane and say that you don't need training or a license to operate it and you crash then yes, it is the plane's fault.
> I'm getting results, your disbelief doesn't change that
Many of these studies show participants self rate as being more productive and getting things done more quickly. In fact, self reporting is well known to be an unreliable metric. People frequently hallucinate. People self report seeing ghosts, aliens, demons, Big Foot, past lives, and all kinds of things that don't exist. Most of these people aren't lying either, they believe the things they report. Most people (probably everybody) have experienced the Mandela Effect in some form or another. Hell, we even know eye witness testimony can be unreliable and even manipulated/influenced.
I've seen plenty around me who claim to be faster with the help of AI. Some are! But most seem to be faster at producing lines of code, not faster at completing the goal. I see a lot more slop and frequently that slop just results in work being outsourced to others. Which, to be fair, does mean they're "faster". But their speed is not on the intended metric.
Maybe I'm the one hallucinating. But maybe you are too. All I know is that when I use AI tools I feel faster but I've also found a get a lot less done.
I don't need to vibe my productivity improvement, I can see it in the 18 projects and 3 scientific papers I've gotten close to complete in the last 2.5 weeks.
Great! If you got more to go off of than vibes then that's a great sign. Tons of people measure their performance in vibes. Frankly, because performance is a really difficult thing to measure.
But we're all just talking to "some random dude on the internet" and that context isn't shared. I'm certain you see both people where AI is helping as well as people where it isn't. Maybe in different proportions than others. But if you're upset that I don't know you, well... that's a bit hard to do in forums like this.
I'm not upset that you don't know me, and I'm fine with people saying AI isn't for them. I even acknowledge that cursor jockeying is only marginally better than hand coding in many cases.
That's a very different situation from having an intensive conversation with an AI to generate a formalized CUE spec with correctness guards, E2E testing specifications, etc, then decomposing that spec into lanes and dispatching a swarm of agents to build it, review work, implement e2e tests/QA, etc. They're both AI, but one is vibe coding and one is autonomous engineering.
Ah, there it is. The excuses, it's scrum all over again.
Mystical practitioners discount the studies done in the open as not fair or not being done right, or the people participating not believing in it hard enough.
> As it is, I have no problem with your naysaying, I'm getting results, your disbelief doesn't change that
Studies and evidence mean nothing to cults and religious believers, so yeah, I am happy that you feel and believe like you have your personal connection to the higher being that is the LLM. Keep the faith!
I don't care about the failures of others. I am succeeding. A study in 1920 would have proved that a man couldn't run a 4 minute mile, imagine if people had stopped trying.
> decrease productivity for experienced developers by roughly 19%.
Seems about right when trying to tell an LLM what to code. But flipping the script, letting the LLM tell you what to code, productivity gains seem much greater. Like most programmers will tell you: Writing code isn't the part of software development that is the bottleneck.
Writing code is a bit crazy, maybe writing tedious test case variations.
But asking an LLM questions about a well established domain you're not expert in is a fantastic use case. And very relevant for making software. In practice, most software requires you to understand the domain your aiming to serve.
I use LLMs every day. They are useful to me (quite useful), but I don’t really use them for coding. I use them as a “fast reference” source, an editor, or as a teacher.
I’ve been at this since 1983, so I’ve weathered a couple of sea changes.
My favorite use is LLMs is a fuzzy search. Give them a description, to search that, iterate. Or get them to role play an expert in some field. Doesn't matter if they hallucinate. Take that jargon and use it to improve your searches.
They're super helpful in these contexts. But these are also contexts where I don't need to rely on accuracy.
> showed that LLMs usage decrease productivity for experienced developers by roughly 19%.
That’s a massive overstatement of what the study found. One big caveat is this: “our developers typically only use Cursor for a few dozen hours before and during the study.” In other words, the 19% slowdown could simply be a learning curve effect.
> one has to be really early in their career to benefit from their usage.
I have decades of experience, and find them very beneficial. But as with any tool, it helps to understand what they are and aren’t good at, and hope to use them effectively. That knowledge comes with experience.
Be careful of dismissing a new tool just because you haven’t figured out how to use it effectively.
LLMs help a lot in doing 'well defined' tasks, and things that you already know you want, and they just accelerate the development of it. You still have to re-write some of it, but they do the boring stuff fast.
They are not great if your tasks are not well defined. Sometimes, they suprise you with great solutions, sometimes they produce mess that just wastes your time and deviates from your mission.
To, me LLMs have been great accelerants when you know what you want, and can define it well. Otherwise, they can waste your time by creating a lot of code slop, that you will have to re-write anyways.
One huge positive sideffect, is that sometimes, when you create a component, (i.e. UI, feature, etc), often you need a setup to test, view controllers, data, which is very boring and annoying / time wasting to deal. LLM can do that for you within seconds (even creating mock data), and since this is mostly test code, it doesn't matter if the code quality is not great, it just matters to get something in the screen to test the real functionality.
AI/LLMs have been a huge time savers for this part.
I get the impression that the software scenarios where LLMs do the best on both reliability and time-saving are places where a task was already ripe (or overdue) to be be abstracted away: Turned into a reusable library; as as a default implementation or setting; expressed as a shorter DSL; or a template/generator script.
When it's a problem lots of people banged their head against and wrote posts about similar solutions, that makes for good document-prediction. But maybe we should've just... removed the pain-point.
How do you find the quality of the Haskell code produced by LLM? Also, how do you use the LLM when coding Haskell? Generating single functions or more?
I'm in a similar situation. I write Haskell daily and have been working with Haskell for a bunch of years.
Though I use claude code. The setup is mostly stock, though I do have a hook that feeds the output of `ghciwatch` back into claude directly after editing. I think this helps.
- I find the code quality to be so-so. It is much more into if-then-else than the style is to yolo for my liking.
- I don't rely on it for making architectural decisions. We do discuss when I'm unsure though.
- I do not use it for critical things such as data migrations. I find that the errors is makes are easy to miss, but not something I do myself.
- I let it build "leaves" that are not so sensitive more freely.
- If you define the tasks well with types then it works faily well.
- cluade is very prone to writing tests that test nothing. Last week it wrote a test that put 3 tuples with strings in a list and checked the length of the list and that none of the strings where empty. A slight overfit on untyped languages :)
- In my experience, the uplift from Opus vs Sonnet is much larger when doing Haskell than JS/Python.
- It matters a lot if the project is well structured.
- I think there is plenty of room to improve with better setup, even without models changing.
I'm stuck in my ways with vim/tmux/ghci etc, so I'm not using some AI IDE. I write stuff into ChatGPT and use the output, copying manually, or writing it myself with inspiration from what I get. I feed it a fair bit of context (like, say, a production module with a load of database queries, and the associated spec module) so that it copies the structure and patterns that I've established.
The quality of the Haskell code is about as good as I would have written myself, though I think it falls for primitive obsession more than I would. Still, I can add those abstractions myself after the fact.
Maybe one of the reasons I'm getting good results is because the LLM effectively has to argue with GHC, and GHC always wins here.
I've found that it's a superpower also for finding logic bugs that I've missed, and for writing SQL queries (which I was never that good at).
“GHC always wins” is a nice sentiment. Another similar thing happens when I have written QuickCheck tests and get the LLM to make the implementation conform. Quickcheck almost always wins that fight as well.
I use similar style as you. neovim with ghci inside, plus hls, and ghciwatch.
Claude code is nice because it is just a separate cli tool that doesn't force you to change editor etc. It can also research things for you, make plans that you can iterate before letting it loose, etc.
Claude is also better than chatgpt at writing haskell in my experience.
That's a skill issue. That lone study was observing untrained participants.
It's no surprise to me that devs who are accustomed to working on one thing at a time due to fast feedback loops have not learned to adapt to paralellizing their work (something that has been demonized at agile style organizations) and sit and wait on agents and start watching YouTube instead, as the study found (productivity hits were due to the participants looking at fun non-work stuff instead of attempting to parallelize any work).
The study reflects usage of emergent tools without training, and with regressive training on previous generation sequential processes, so I would expect these results. If there is any merit in coordinating multiple agents on slower feedback work, this study would not find it.
Interesting take. I suggest an alternative take: it's a skill issue if LLMs help a developer.
If the study showed that experienced developers suffered a negative performance impact while using an LLM, maybe where LLMs shine are with junior developers?
Until a new study that shows otherwise comes out, it seems the scientific conclusion is that junior developers, the ones with the skill issues, benefit from using LLMs, while more experienced developers are impacted negatively.
I look forward to any new studies that disprove that, but for now it seems settled. So you were right, might indeed be a skills issue if LLMs help a developer and if they do, it might be the dev is early in their career. Do LLMs help you, out of curiosity?
> Our analysis reveals that LLM-assistants offer both considerable benefits and critical risks. Commonly reported gains include minimized code search, accelerated development, and the automation of trivial and repetitive tasks. However, studies also highlight concerns around cognitive offloading, reduced team collaboration, and inconsistent effects on code quality.
Why are you ignoring the existence of these 37 other studies and pretending the one study you keep sharing is the only in existence and thus authoritatively conclusive?
Furthermore from the study you keep sharing, they state:
> We do not provide evidence that:
AI systems do not currently speed up many or most software developers. Clarification: We do not claim that our developers or repositories represent a majority or plurality of software development work
Why do YOU claim that this study provides evidence, conclusively and as settled science, that AI systems do not speed up many or most developers? You are unscientifically misrepresenting the study you are so eager to share. You are a complete “hype man” for this study beyond what it evidences because of your eagerness for a way to shut down discourse and dismiss any progress since the study’s focus on Sonnet 3.5. The study you share even says that there has been a lot of progress in the last five years and future progress as well as different techniques in using the tools may produce productive results and that the study doesn’t evidence otherwise! You are unserious.
Imagine if you’d worked for a decade as a dev using Notepad as your code editor (in a world where that was the best editor somehow). You’d developed your whole career in Notepad and knew very well how to work with it
Then, someone did a two week study on the productivity difference between Notepad, vim, emacs, and VSCode. And it turns out that there was lower observed productivity for all of the latter 3, with the smallest reduction seen in VSCode.
Would you conclude that Notepad was the best editor, followed by VSCode and then vim and emacs being the worst editors for programming?
That’s the flaw I see in the methodology of that study. I’m glad they did it, but the amount of “Haha, I knew it all along and if you claim AI helps you at all, it’s just because you sucked all along…” citing of that study is astonishing.
> citing of that study is astonishing and somewhat comical
I would like to see your study, one that's not sponsored by OpenAI or github, that shows LLMs actually improved anything for experienced developers. Crickets.
So, to summarize:
1. An actual study shows that experienced developer's productivity declines 19% when using an LLM.
I don't think the study is flawed. It just seems rather narrow:
"We conduct a randomized controlled trial (RCT) to understand how AI tools at the February-June 2025 frontier affect the productivity of experienced open-source developers. 16 developers with moderate AI experience complete 246 tasks in mature projects on which they have an average of 5 years of prior experience."
So the question is what other kinds of software development tasks this result applies to. Moderate AI experience is fine. This applies to many other situations. But 5 years of experience with a single code base is an outlier.
That said, they used relatively large repositories (1.1 million LOC) and the tasks were randomly assigned. So developers couldn't pick and choose tasks in areas of the codebase they already knew extremely well.
I think the study does generalise to some degree, but I've seen conclusions drawn from this study that the methodology doesn't support. In my view, it doesn't generalise over all or even most software development taks.
Personally, I'm a bit sceptical (but not hostile) about LLMs for coding (and some other thinking tasks), because the difference in quality between requests for which there are many examples and tasks for which there are only few examples is so extreme.
Reasoning capabilities of LLMs still seem minimal.
My argument is limited to “we don’t know and one study with significant limitations with regards to participant adaptation doesn’t settle anything definitively for the long-term”.
Your argument seems to project significantly more certainty and spittle.
The LLM crowd always sees themselves as messianic and victims, eerly reminding me of the NFT crowd back 1 year ago. I would not be surprised if a lot of those are the same folks.
The burden of proof is on the ones saying a new concept/tool (LLMs/NFT) is revolutionary or useful. I provided studies showing not only the new concept is not revolutionary, but that it is a step back in terms of productivity. Where are the studies and evidence proving that LLMs are a revolution?
NFT boosters tried for years to make us believe something that wasn't there. I will take the LLM crowd more seriously when I actually see the impact and usefulness of LLMs. For now, it's simply not there.
> Your argument seems to project significantly more certainty and spittle.
I am not surprised that a bunch of folks outsourcing their critical thinking to a fancy autocomplete don't have any arguments nor studies though, to refute a pretty simple argument with some receipts behind it. Spittle? Please, at least there is an argument and links.
From the LLM cult crowd there is usually nothing, just crickets. Show me the studies, show me the links, show me the proof that LLMs are the revolution you so desperately want it to be.
Until then, I got the receipts that, if anything, LLMs are just another tool but hardly a revolution worth paying attention to.
That's an opinion many disagree with. As a matter of fact, the only limited study up to date showed that LLMs usage decrease productivity for experienced developers by roughly 19%. Let's reserve opinions and link studies.
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-o...
My anecdotal experience, for example, is that LLMs are such a negative drain on both time and quality that one has to be really early in their career to benefit from their usage.