Joking aside, if he is one of the top developers in the company and if he is "actually" a good developer, when compared to others outside of the company, then I can see this bill.
The current feature that I'm working on, required 100 messages to finalize things and I would say the context window was around 35k - 50k per "chat completion". My model of choice is Gemini 2.5 Flash which has an input cost of $0.30/1M. Compare this to Sonnet which is $3.00/1M.
If the person was properly designing and instructing the LLM to build something advanced correctly, I can see the bill being quite high. I personally don't think you need to use Sonnet 99% of the time, but if somebody else is willing to pay the bill, why not.
A major issue we have right now is, we want the coding process to be more "Agentic", but we don't have an easy way for LLMs to determine what to pull into context to solve a problem. This is a problem that I am working on with my personal AI search assistant, which I talk about below:
Analyzers are the "Brains" for my search, but generating the analysis is both tedious and can be costly. I'm working on the tedious part and with batch processing, you can probably process thousands of files for under 5 dollars with Gemini 2.5 Flash.
With batch processing and the ability to continuously analyze 10s of thousands of files, I can see companies wanting to make "Agentic" coding smarter, which should help with GPU utilization and drive down the cost of software development.
No what I am saying is there are more applications for batch processing that will help with utilization. I can see developers and companies using off hour processing to prep their data for agentic coding.
As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.
If somebody from Cerebras is reading this, are you having capacity issues?
You can get your own key with cerebras and then use it in openrouter. Its a little hidden, but for each provider you can explicitly provide your own key. Then it won't be throttled.
I think for most use cases, it doesn't make much sense to use vector DBs. When I started to design my AI Search feature, I researched chunking a lot and the general consensus was, you can can lose context if you don't chunk in the right way and there wasn't really a right way to chunk. This was why I decided to take the approach that I am using today, which I talk about in another comment.
With input cost for very good models ($0.30/1M) for Gemini 2.5 Flash (bulk rates would be $0.15/1M), feeding the llm thousands of documents to generate summaries would probably cost 5 dollars or less if using bulk rate pricing. With input cost and with most SOTA LLMs being able to handle 50k tokens in context window with no apparent lost in reasoning, I really don't see the reason for vector DBs anymore, especially if it means potentially less accurate results.
Actually, chunking isn't such a bad problem with code, it chunks itself, and code embeddings produce better results. The problem is that RAG is fiddly, and people try to just copy a basic template or use a batteries included lib that's tuned to QA, which isn't gonna produce good results.
> Actually, chunking isn't such a bad problem with code, it chunks itself, and code embeddings produce better results.
I can't remember what post I read this in (but it was on Hacker News) and I read when designing Claude Code, they (Anthropic) tried a RAG approach but it didn't work very well compared to loading in the full file. If my understanding of how Claude Code works is correct (this was based on comments from others), was it "greps like a intern/junior developer". So what Claude Code does (provided grep is the key), is it would ask Sonnet for keywords to grep for based on the users query. And it would continuously revise the grep key words until it was satisfied with the files that it found.
As ridiculous as this sounds, this approach is not horrible, albeit very inefficient. For my approach, I focus on capturing intent which is what grep can't match. And for RAG, if the code is not chunked correctly and/or if the code is just badly organized, you may miss the true intent for the code.
Oh yeah, loading in full files when possible is great. I use Gemini pro to look at bundles of my whole codebase, the level of comprehension it gets from that is pretty shocking.
This is why I think Vector DBs are probably not going to be used for a lot of applications in the future. It served a very valid purpose when context windows were a lot smaller and LLMs were not as good, but moving forward, I personally think it makes less and less sense.
Vector DBs will still be around to do a first pass before feeding data in to a long context reasoner like Gemini in most cases. The thing that's going to go away is rerankers.
In a nutshell, it generates a very short summary of every document along with keywords. The basic idea is to use BM25 ranking to identify the most relevant documents for the AI to review. For example, my use case is to understand how Aider, Claude Code, etc., store their conversations so that I can make them readable in my chat app. To answer this, I would ask 'How does Aider store conversations?' and the LLM would construct a deterministic keyword search using terms that would most likely identify how conversations are stored.
Once I have the list of files, the LLM is asked again to review the summaries of all matches and suggest which documents should be loaded in full for further review. I've found this approach to be inconsistent, however. What I've found to work much better is just loading the "Tiny Overview" summaries into context and chatting with the LLM. For example, I would ask the same question: "Which files do you think can tell me how Aider stores conversations? Identify up to 20 files and create a context bundle for them so I can load them into context." For a thousand files, you can easily fit three-sentence summaries for each of them without overwhelming the LLM. Once I have my answer, I just need a few clicks to load the files into context, and then the LLM will have full access to the file content and can better answer my question.
> Do people really try to one-shot their AI tasks?
Yes. I almost always end with "Do not generate any code unless it can help in our discussions as this is the design stage" I would say, 95% of my code for https://github.com/gitsense/chat in the last 6 months were AI generated, and I would say 80% were one shots.
It is important to note that I can easily get into the 30+ messages of back and forth before any code is generated. For complex tasks, I will literally spend an hour or two (that can span days) chatting and thinking about a problem with the LLM and I do expect the LLM to one shot them.
Not even remotely since the 5% that I need to write is usually quite complex. I do think my writing proficiency will decrease though. However my debugging and problem solving skills should increase.
Having said all of that, I do believe AI will have a very negative affect on developers where the challenge is skill and not time. AI is implementing things that I can do if given enough time. I am literraly implementing things in months that would have taken me a year or more.
My AI search is nontrivial but it only took two months to write. I should also note the 5% that I needed to implement was the difference between throw away code and a usuable search engine.
>Not even remotely since the 5% that I need to write is usually quite complex.
Not sure I believe this. If you suddenly automate away 95% of any task, how could it be the case you retain 100% of your prior abilities?
>However my debugging and problem solving skills should increase
By "my", I assume you mean "my LLM"?
>I do think my writing proficiency will decrease though.
This alone is cause for concern. The ability for a human being to communicate without assistance is extremely important in an age where AI is outputting a significant fraction of all new content.
> Not sure I believe this. If you suddenly automate away 95% of any task, how could it be the case you retain 100% of your prior abilities?
I need to review like crazy now, so it is not like I am handing off my understanding of the problem. If anything, I learn new things from time to time, as the LLM will generate code in ways that I haven't thought of before.
The AI genie is out of the bottle now and I do believe in a year or two, companies are going to start asking for conversations along with the LLM generated code, which is how I guess you can determine if people are losing their skill. When my code is fully published, I will include conversations for every feature/bug fix that is introduced.
> The ability for a human being to communicate without assistance is extremely important
I agree with this, but once again, it isn't like I don't have to review everything. When LLMs get much better, I think my writing skills may decline, but as it currently stands, I do find myself having to revised what the LLM writes to make it sound more natural.
Everything is speculation at this point, but I am sure I will lose some skills but I also think will gain new ones by being exposed to something that I haven't thought of before.
I wrote my chat app because I needed a more comfortable way to read and write *long* messages. For the foreseeable future, I don't see my writing proficiency to decrease in any significant manner. I can see myself being slower to write in the future though, as I find myself being very comfortable speaking to the LLM in a manner that I would not to a human. LLMs are extremely good at inferring context, so I do a lot lazy typing now to speed things up, which may turn into a bad habit.
But that isn't what the author is talking about. The issues is, your good code can be equal to slop that works. What the author says needs to happen is, you need to find a better way to stand out. I suspect for many businesses where software superiority is not a core requirement, slop that works will be treated the same as non-slop code.
You are focusing on code. That is the wrong focus. Creating code was never the job. The job was being trustworthy about what I deliver and how.
AI is not worthy of trust, and the sort of reasonable people I want to deal with won’t trust it and don’t. They deal with me because I am not a simulation of someone who cares— I am the real thing. I am a purple cow in terms of personal credibility and responsibility.
To the degree that the application of AI is useful to me without putting my credibility at risk, I will use it. It does have its uses.
(BTW, although I write code as part of my work, I stopped being a full-time coder in my teens. I am tester, testing consultant, expert witness, and trainer, now.)
Until that slop that works leads to therac-26 or PostOfficeScandal2 electric boogaloo. Neither of those applications required software superior to their competitors, just working software
The average quality of software can only trend down so far before real world problems start manifesting, even outside of businesses with a hard requirement on "software superiority"
It's so bizarre to me seeing these comments as a professional software engineer. Like, you do realize that at least 80% of the code written in large companies like Microsoft, Amazon, etc was slop long before AI was ever invented right?
The stuff you get to see in open source, papers, academia- that's a very small curated 1% of the actual glue code written by an overworked engineer at 1am that holds literally everything together.
Why is it bizarre? I’m a tester with 38 years in the business. I’ve seen pretty much every kind of project and technology.
I was testing at Microsoft on the week that Windows 2000 shipped, showing them that Photoshop can completely freeze Windows (which is bad, and something they needed to know about).
The creed of a tester begins with faith in existence of trouble. This does not mean we believe anything is perfectible— it means we think it is necessary to be vigilant.
AI commits errors in a way and to a degree that should alarm any reasonable engineer. But more to the point: it tends to alienate engineers from their work so that they are less able to behave responsibly. I think testing is more important than ever, because AI is a Gatling gun of risk.
> 120K a year developer, if it makes them even 20% more efficient (doubt that), you have a ceiling of $2000/mo
I don't think businesses sees it this way. They sort of want you to be 20% more efficient by being 20% better (with no added cost). I'm sure the math is, if their efficiency is increased by 20% then than means we can reduce head count by 20% or not hire new developers.
Oh its much worse than that - they think that most developers don't do anything and the core devs are just supported by the ancillary devs, 80% of the work in core devs and 20% otherwise.
In many workplaces this is true.
That means an "ideal" workspace is 20% of the size of its current setup, with AI doing all the work that the non-core devs used to do.
> I wish people could be a bit more open about what they build.
I would say for the last 6 months, 95% of the code for my chat app (https://github.com/gitsense/chat) was AI generated (98% human architected). I believe what I created in the last 6 months was far from trivial. One of the features that AI helped a lot with, was the AI Search Assistant feature. You can learn more about it here https://github.com/gitsense/chat/blob/main/packages/chat/wid...
As a debugging partner, LLMs are invaluable. I could easily load all the backend search code into context and have it trace a query and create a context bundle with just the affected files. Once I had that, I would use my tool to filter the context to just those files and then chat with the LLM to figure out what went wrong or why the search was slow.
I very much agree with the author of the blog post about why LLMs can't really build software. AI is an industry game changer as it can truly 3x to 4x senior developers in my opinion. I should also note that I spend about $2 a day on LLM API calls (99% to Gemini 2.5 Flash) and I probably have to read 200+ LLM generated messages a day and reply back in great detail about 5 times a day (think of an email instead of chat message).
Note: The demo on that I have in the README hasn't been setup, as I am still in the process of finalizing things for release but the NPM install instructions should work.
> probably have to read 200+ LLM generated messages a day and reply back in great detail about 5 times a day (think of an email instead of chat message).
I can think of nothing more tiresome than having to read 200 emails a day, or LLM chat messages. And then respond in detail 5 of those times. It wouldn't lead to "3x to 4x" performance gain after tallying up all the time reading messages and replying. I'm not sure people that use LLMs this way are really tracking their time enough to say with any confidence that "3x to 4x" is anywhere close to reality.
A lot of the messages are revisions so it is not as tedious as it may seem. As for the "3x to 4x", this is my own experience. It is possible that I am an outlier, but 80% of the generated AI code that I have are one-shot. I spend an hour or two (usually spread over days thinking about the problem) to accomplish something that would have taken a week or more for me to do.
I'm going to start producing metrics regarding how much code is AI generated along with some complexity metrics.
I am obviously bias, but this definitely feels like a paradigm shift and if people do not fully learn to adapt to it, it might be too late. I am not sure if you have ever watched Gattaca, but this sort of feels like it...the astronaut part, that is.
The profession that I have known for decades is starting to feel very different, in the same way that while watching Gattaca, my perception of astronauts changed. It was strange, but plausible and that is what I see for the software industry. Those that can articulate the problem I believe will become more valuable than the silent genius.
The same noise was made about pair programming and it hasn't really caught on. Using LLMs to write code is one way of getting code written, but it isn't necessarily the best, and it seems kind of fad-ish honestly. Yes, I use "AI" in my coding workflow, but it's overall more annoying than it is helpful. If you're naturally 3x-4x times slower than I am, then congratulations, you're now getting up to speed. It's all pretty subjective I think.
This is very measurable, as you are not measuring against others, but yourself. The baseline is you, so it is very easy to determine if you become more productive or not. What you are saying is, you do not believe "you" can leverage AI to be more efficient than you currently are, which may well be true due to your domain and expertise.
No matter what "AI" can or can't do for me, it's being forced on us all anyway, which kind of sucks. Every time I select something the AI wrote it's collecting a statistic and I'm sure someone is probably monitoring how much we use the "AI" and that could become a metric for job performance, even if it doesn't really raise quality or amplify my output very much.
> being forced on us all anyway, which kind of sucks
Business is business, and if you can demonstrate that you are needed they will keep you, for the most part, but business also has politics.
> probably monitoring how much we use the "AI" and that could become a metric for job performance
I will bet on this and take it one step further. They (employer) are going to want to start tracking LLM conversations. If everybody is using AI, they (employer) will need differentiators to justify pay raises, promotions and so forth.
Are you implying that someone starting to use AI now has already been left so far behind by experienced users that they would never catch up? That seems ridiculous - it seems to be getting better understood with time, which should make catching up increasingly easier.
It's actually more than 6 months. 6 months was when I developed enough to start chatting with AI to be really productive. Moving forward once the licence is in place and the files become unminifed you can track exactly what ai generated.
It summarized the instructions required to install and setup. It (Gemini and Sonnet) did fail to mention that I need to setup a server and create a DNS entry for the sub domain.
I've been able to create a very advanced search engine for my chat app that is more than enterprise ready. I've spent a decade thinking about search, but in a different language. Like you, I needed to explain what I knew about writing a search engine in Java for the LLM, to write it in JavaScript using libraries I did not know and it got me 95% of the way there.
It is also incredibly important to note that the 5% that I needed to figure out was the difference between throw away code and something useful. You absolutely need domain knowledge but LLMs are more than enterprise ready in my opinion.
Here is some documentation on how my search solution is used in my app to show that it is not a hobby feature.
Thanks for your reply, I am in the same boat, and it works for me, like it seems to work for you. So as long as we are effective with it, why not? Of course I am not doing things blindly and expect good results.
The current feature that I'm working on, required 100 messages to finalize things and I would say the context window was around 35k - 50k per "chat completion". My model of choice is Gemini 2.5 Flash which has an input cost of $0.30/1M. Compare this to Sonnet which is $3.00/1M.
If the person was properly designing and instructing the LLM to build something advanced correctly, I can see the bill being quite high. I personally don't think you need to use Sonnet 99% of the time, but if somebody else is willing to pay the bill, why not.