More

meander_water · 2025-06-06T04:45:31 1749185131

This is uncanny, I was going to write almost this exact comment. I've been told mine is due to a deficiency in working memory, which can then lead to the brain not converting to long term memory. something that ADHDers present commonly with.

jaggederest · 2025-06-06T05:38:40 1749188320

I'm in the opposite camp - I also have poor working memory, but instead I have extremely good episodic memory. Good enough I have to remind other people about our shared experiences routinely. I never claim to have eidetic memory but I've only met a couple other people who have memory like mine (one is my mother, so that's kind of cheating)

It's interesting though because I associate with a lot of what the article is talking about with spatial and knowledge memory too. I often have to remember where something was to "step into" the memory again.

sh34r · 2025-06-06T06:43:09 1749192189

Depression causes similar memory impairments.

meander_water · 2025-06-04T09:17:03 1749028623

Well said, I think this is essentially what people who practice Radical Candour [0] do.

I would much rather my manager give me the harsh truth upfront rather than letting it simmer, and making things worse for both of us.

[0] https://en.m.wikipedia.org/wiki/Radical_Candor

meander_water · 2025-06-03T01:09:36 1748912976

I suspect a large proportion of claims made for productivity increases are skewed by the fact that the speed at which code is produced by AI makes you _feel_ productive, but these gains are largely replaced by the effort to understand, refactor, review and clean up the code. The high that you get when something "works" tends to stick more in your memory than the time when you had to spend a day cleaning up dead code, refactoring 2k line modules into a more readable project structure etc.

I'm not saying that AI can't make you productive, it's just that these claims are really hard to verify. Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May), producing a single file with >2k lines. Is that going to be harder to maintain than a codebase with a proper project structure that's easily parseable by a human?

kentonv · 2025-06-03T02:04:44 1748916284

> Even the recently posted Cloudflare OAuth worker codebase took ~3 months to release (8 Mar - 20 May)

This is incorrect. The library was part of the MCP framework we launched on March 25 -- the same month development began:

https://blog.cloudflare.com/remote-model-context-protocol-se...

Indeed the speed with which we were able to turn this around was critical to us, as it allowed us to have our Remote MCP framework ready immediately when the spec was finalized, which led to quite a few companies building MCP servers on Cloudflare: https://blog.cloudflare.com/mcp-demo-day/

I'm not an AI maximalist. I still write lots of code by hand, because there's a lot AI isn't good at. It's good at boilerplate and straightforward code, it's bad at refactoring deep systems. But AI assistance was undeniably a huge win for the OAuth project. There's no way I could have written that library by hand so quickly. (Maybe when I was 25 and had no responsibilities, but these days I have like 1 solid day a week to actually write code...)

JackSlateur · 2025-06-04T20:00:38 1749067238

First commit: Feb 27th 2025

commit 3b2ae809e9256d292079bb15ea9fe49439a0779c Author: Kenton Varda <[email protected]> Date: Thu Feb 27 17:04:12 2025 -0600

    Have Claude write an OAuth provider implementation.

kentonv · 2025-06-04T20:07:35 1749067655

Fine, sorry, apparently both of meander_water's dates were incorrect, and I actually started the work two days before March. It was still less than a month from there to release, though.

meander_water · 2025-06-03T02:39:50 1748918390

Apologies, I didn't mean to misrepresent your work. Big fan of your work by the way, I was a happy user of sandstorm.io back in the day.

onetimeusename · 2025-06-03T03:22:13 1748920933

Ok sorry to get abstract but to me what you are talking about is differentiating between understanding and correctness. We as humans, for now, need to understand the code and that's not easily transmitted from the output of some AI. In fact, that's a hard problem. But I don't think it's impossible for AI to assist humans with that. The AI could help walk humans through the code to understand quickly what's going on. Maybe ultimately the issue here is trust. Do we trust the AI to write code. Maybe we spend more time trying to verify it for now. I think that shows we place a lot of trust in humans to write code. Maybe that changes.

tptacek · 2025-06-03T01:48:38 1748915318

This is cope. I know my own experience, and I know the backgrounds and problem domains of the friends I'm talking to that do this stuff better than I do. The productivity gains are real. They're also intuitive: if you can't look at a work week and spot huge fractions of work that you're doing that isn't fundamentally discerning or creative, but rather muscle-memory rote repetition of best practices you've honed over your career, you're not trying (or you haven't built that muscle memory yet). What's happening is skeptics can't believe that an LLM plus a couple hundred lines of Python agent code can capture and replicate most of the rote work, freeing all that time back up.

Another thing I think people are missing is that serious LLM-using coders aren't expecting 100% success on prompts, or anything close to it. One of the skills you (rapidly) develop is the intuition for when to stop a runaway agent.

If an intern spun off hopelessly on a task, it'd be somewhat problematic, because there are finite intern hours and they're expensive. But failed agent prompts are nickel-denominated.

We had a post on the front page last week about someone doing vulnerability research with an LLM. They isolated some target code and wrote a prompt. Then they ran it one hundred times (preemptively!) and sifted the output. That approach finds new kernel vulnerabilities!

Ordinary developers won't do anything like that, but they will get used to the idea of only 2/3 of prompts ending up with something they merge.

Another problem I think a lot of skeptics are running into: stop sitting there staring at the chain of thought logs.

chotmat · 2025-06-03T03:44:05 1748922245

The thing I don't understand is that you keep bringing up your friends' experience in all your responses and in the blog itself. What about your experience and your success rate and productivity gain that you observed with AI agent? It feels like you yourselves aren't confident on your gain and must bring up second hand experience from your friends to prop up your arguments

kiitos · 2025-06-07T11:24:05 1749295445

> if you can't look at a work week and spot huge fractions of work that you're doing that isn't fundamentally discerning or creative, but rather muscle-memory rote repetition of best practices you've honed over your career, you're not trying (or you haven't built that muscle memory yet). What's happening is skeptics can't believe that an LLM plus a couple hundred lines of Python agent code can capture and replicate most of the rote work, freeing all that time back up.

No senior-level engineer worth their salt, and in any kind of minimally effective organization, is spending any meaningful amount of their time doing the rote repetition stuff you're describing here. If this is your experience of work then let me say to you very clearly: your experience is pathological and non-representative and you need to seek better employment :)

onetimeusename · 2025-06-03T03:27:44 1748921264

Regardless of what people say to you about this, most (all?) undergraduates in CS programs are using LLMs. It's extremely pervasive. Even people with no formal training are using AI and vercel and churning out apps over the weekend. Even if people find reasons to dislike AI code writing, culturally, it's the future. I don't see that changing. So either a huge percent of people writing code are doing it all wrong or times are changing.

enjo · 2025-06-03T01:51:32 1748915492

Just a data point:

I think it has a lot to do with the type of work you are doing. I am a couple of years into a very small startup that has some actual technology built (as opposed to a really simple CRUD app or something).

When I am working on the front-end where things are pretty simple AI is a huge speed up. What it does VERY well it latch on to patterns and then apply those patterns to other things. If it has a couple of examples you can point it to and say "ok build that but over here" the newest revisions of Claude and Gemini are perfectly capable of building the whole thing end to end. Because it's a fairly repetitive task I don't have to spend much time untangling it. I can review it and pattern match against things that don't look right and then dive into those.

For a real example, I needed a page for a user to manually add a vendor in our platform. A simple prompt asking Claude to add a button to the page sent into a mode where it added the button, built the backend handler, added the security checks, defined a form, built another handler to handle the submitted data, and added it to the database. It even wrote the ACL correctly. The errors it introduced were largely around using vanilla HTML in place of our standard components and some small issues with how it attempted to write to the DB using our DB library. This saved me a couple of hours of typing.

Additionally if I need to refactor something AI is a godsend. Just today an underlying query builder completely changed its API and broke..everything. Once I identified how I wanted to handle the changes and wrote some utilities I was able to have Claude just find everything everywhere and make those same changes. It did it with like 90% accuracy. Once again that saved me a couple of hours.

Where it fails, usually spectacularly, is when we get to the stuff that is new or really complex. If it doesn't have patterns to latch onto it tries to invent them itself and the code is garbage. Rarely does it work. Attempting to vibe code it with increasingly more pointed prompts will often result in compiling code but almost never will it do the thing I actually wanted.

In these contexts it's usefulness is mostly things like "write a sql query to do X" which occasionally surfaces a technique I hadn't thought about.

So my experience is pretty mixed. I am definitely saving time. Most of it is typing time not thinking time. Which is like 1/3 of my average day. If I had to guess I am somewhere in the neighborhood of 30-40% faster today than I was in 2019. Notably that speed up has allowed me to really stretch this funding round as we are well past the phase where we would have typically hired people in my past companies. Usually someone relatively mid-level to take over those repetitive tasks.

Instead it's just me and a non-technical founder going along super quickly. We will likely be at a seed round before anyone new comes in.

meander_water · 2025-06-02T00:19:33 1748823573

I came across this recently on Fiverr [0]. I thought it was a joke initially, but the volume of people offering their services implies that there is demand out there somewhere.

[0] https://www.fiverr.com/categories/online-marketing/generativ...

handfuloflight · 2025-06-02T00:21:54 1748823714

Absolute review count suggests otherwise.

SoKamil · 2025-06-02T02:17:43 1748830663

It might be that there are more shovel sellers than gold to mine.

meander_water · 2025-05-31T10:12:04 1748686324

I've recently started wondering what the long term impacts of AI slop is going to be. Will people get so sick of the sub-par quality that there will be a widespread backlash, and a renewed focus on handmade or artisinal products made by hand? Or will we go the other way where everyone will accept the status-quo and everything will just get shittier, and we will just have multiple cycles of AI slop trained on AI slop?

jazzyjackson · 2025-06-01T06:43:44 1748760224

I'm already seeing screen-free summer camps in my area. There's going to be a subset of the population that does not want to play along with calling hallucinations and deepfakes "progress," kids will be homeschooled more as parents lose their jobs and traditional classroom instruction loses effectiveness.

I thought the movie "the Creator" was pretty neat, it envisions a future where AI gets blamed for accidentally nuking Los Angeles so America bans it and reignites a kind of cold war with Asia which has embraced GAI and transcended the need for central governance. Really it's a film about war and how it can be started with a lie but continue out of real existential fear.

lithocarpus · 2025-05-31T10:26:55 1748687215

I'll guess it will be both at the same time with a far greater number of people going for the easier (latter) option, but still a real chunk of people going for what's real, and also a spectrum in between.

This is how it already is for most aspects of life that have, for many, been enshittified by progress. Sadly the shitty part is not entirely avoidable by choice.

meander_water · 2025-05-31T07:13:37 1748675617

I wonder if this is going to change the ad/marketing industry. People generally put up with shitty ads, and these will be much cheaper to produce. I dread what's coming next.

meander_water · 2025-05-31T07:03:39 1748675019

This looks interesting, but anytime security is offloaded to an LLM I am extremely skeptical. IMO the right way to do this is to enforce permissions explicitly through a AuthZ policy. Something like what Toolhive [0] is doing is the right way I think.

All MCP comms from client to server go through an SSE proxy which has AuthN and AuthZ enabled. You can create custom policies for AuthZ using Cedar [1].

[0] https://github.com/stacklok/toolhive, https://github.com/stacklok/toolhive/blob/main/docs/authz.md

[1] https://docs.cedarpolicy.com/

gsundeep · 2025-05-31T07:16:16 1748675776

This is really interesting, I'll check it out. At least in its current form this seems like it would take some effort to setup - we're focusing heavily on making MCP Defender easy to setup in less than a minute and then forgetting about it as it runs in the background.

ImPostingOnHN · 2025-05-31T12:17:06 1748693826

> we're focusing heavily on making MCP Defender easy to setup in less than a minute and then forgetting about it as it runs in the background

an admirable goal!

given the fallibility of LLMs, are you sure it's a good idea that they forget about it?

that seems like it has the same risks as having no security (perhaps worse, lulling people into a false sense of security)

are you sure the LLM doing security can't be tricked/attacked using any of the usual methods?

meander_water · 2025-05-31T06:23:27 1748672607

Can you explain how this compares to Kata Containers? [0] That also supports OCI to run microVMs. You can also choose different hypervisors such as firecracker to run it on.

[0] https://katacontainers.io/

appcypher · 2025-05-31T09:05:50 1748682350

Katacontainers is an interesting project. Microsandbox is a more opinionated project with a UX that focuses on getting up and running with microVMs quickly. I want this experience for Linux, macOS and Windows users.

More importantly is making sandboxing really accessible to AI devs with `msb server`.

meander_water · 2025-05-25T03:05:26 1748142326

I'm not sure about the assertion that this is the first vulnerability found with an LLM. For e.g. OSS-Fuzz [0] has found a few using fuzzing, and Big Sleep using an agent approach [1].

[0] https://security.googleblog.com/2024/11/leveling-up-fuzzing-...

[1] https://googleprojectzero.blogspot.com/2024/10/from-naptime-...

seanheelan · 2025-05-25T09:07:31 1748164051

It's certainly not the first vulnerability found with an LLM =) Perhaps I should have been more clear though.

What the post says is "Understanding the vulnerability requires reasoning about concurrent connections to the server, and how they may share various objects in specific circumstances. o3 was able to comprehend this and spot a location where a particular object that is not referenced counted is freed while still being accessible by another thread. As far as I'm aware, this is the first public discussion of a vulnerability of that nature being found by a LLM."

The point I was trying to make is that, as far as I'm aware, this is the first public documentation of an LLM figuring out that sort of bug (non-trivial amount of code, bug results from concurrent access to shared resources). To me at least, this is an interesting marker of LLM progress.

meander_water · 2025-05-19T09:46:17 1747647977

Totally agree, I personally have obsidian set up on multiple devices, and they all automatically sync to my local Synology NAS.