The book content itself is deliberately free of AI-generated prose. Drafts may start anywhere, but final text should be reviewed, edited, and owned by a human contributor.
There is more specificity around AI use in the project README. There may have been LLMs used during drafting, which has led to the "hallmarks" sticking around that some commenters are pointing out.
That statement is honestly self-contradictory. If a draft was AI-generated and then reviewed, edited, and owned by a human contributor, then the parts which survived reviewing and editing verbatim were still AI-generated...
Why do you care, if a human reviewed and edited it, someone filtered it to make sure it’s correct. It’s validated to be correct, that is the main point.
Clearly someone didn't make sure everything is correct, since they allowed a self-contradictory statement (whether generated by AI or by human) into the text...
People have the illusion of reviewing and "owning" the final product, but that is not how it looks like from the outside. The quality, the prose style, the errors that pass through due to inevitable AI-induced complacency ALWAYS EVENTUALLY show. If people got out of the AI bubbles they would see it too, alas.
We keep reading the same stories for at least a couple of years now. There is no novelty anymore. The core issues and problems have stayed the same since gpt3.5. And because they are so omnipresent in the internet, we have grown to be able to recognise them almost automatically. It is no longer just a matter of quality, it is an insult to the readers when an author pretends that content is not AI generated just because they "reviewed it". Reviewing sth that somebody else wrote is not ownership, esp when that sth is an LLM.
In any case, I do not care if people want to read or write AI generated books, just don't lie about it being AI generated.
I think the transition between things being done is maybe a bit more interesting.
"Build a house," is a thing, but at which point of the dreaming, designing, planning, permitting, architecting, financing, contracting, acquiring, installing, verifying, reworking, coordinating, inspecting, styling, unpacking, cleaning, landscaping, repairing, upgrading, and waiting, are you doing it?
Perhaps each discrete thing is itself a thing, and when you do a thing, or transition between things, is the matter needing clarifying.
If you are coordinating on how to do a thing, you are doing the thing, as long as the thing is coordinating on how to do the next thing. This "coordinating a thing" is a discrete thing we should not confuse for any other thing.
TFA's examples are unforgiving in that they suppose there is only ever one thing and all else should be nothing.
Hey! Have you come across the recent(ish) paper from Google researchers about self-replicators? In one of their experiments they used a self-modifying (metaprogrammable) variant of BrainFuck that I've found very interesting for EAs. I haven't fully replicated their findings as I've been experimenting with better ways to observe the evolution progress, but perhaps it might be interesting for your work as well.
I've read it many times. It introduced me to the idea of using self-modifying programs. I've built several prototypes that use their exact instruction set, but I haven't found a way to guide the evolution in a desired direction yet. The self-modifying aspect can be quite destructive to the genome.
I have to dig up my old code. I remember It was difficult to observe and identify the replicators. I don't remember following their "tracker" idea.
As you've mentioned, it can be quite self destructive, so I've been experimenting with the instruction set itself.
Each cell is one of 256 values, where only 10 of those values are instructions. In addition, the original instruction set is not uniformly distributed. This means that the likelihood of mutating destructively is highly affected by ratio of valid instructions and their distribution. For example, a cell with value 0xD0 is much less likely to mutate to a valid instruction than a cell of value 0x0D (assuming UTF-8). By playing with these parameters to make the state space smaller, I've seen significantly different levels of stability.
I'd love to follow your work if you share it anywhere!
I'm inspired by this and want to extend it, perhaps telescopically, by discussing what the thing is.
Sometimes we see our task as being, "do C," and we forget the "B" and "A" that come before.
Maybe you can't do "C" without discussing it ("B") or researching how others did it ("A"). In these cases, we shouldn't simply think the thing is "C"—the thing must first be "A," then "B," and then, "C."
If we forget this, we're bound to think "C" is the only thing of value, that it should take an hour and not a week, or that people doing the "A's" or "B's" to enable the "Cs" must be doing nothing at all!
It's easy to underestimate that groundwork because it's less flashy or quantifiable. But skipping it usually just means C turns out half-baked, or takes even longer in the end
Just yesterday I was helping a family member install roofing. The roof was done up to slats, the roofing was big metal sheets, should be done in an hour. Except we spent good five hours on various little details before the first sheet was going up. And you cant exactly not do those, at least if you want the roof to stay there, and the weather to stay outside.
It's not wrong, that's exactly what I'm paying them for. If they didn't have the education then they wouldn't be a doctor, and I wouldn't be seeing them for a consultation.
I'm well compensated not because I'm good at googling things, but because I have a proven track record of being good at googling things. If a junior was able to produce the same results they wouldnt be paid more.
If you pay a doctor, the thing you're doing is paying a doctor. Your "A" or "B" might be booking the appointment or figuring out how to send the payment. I'm not sure I follow.
I just tried an experiment using Spec-Kit from GitHub to build a CLI tool. Perhaps the scope of the tool doesn't align itself with Spec-Driven Development, but I found the many many hours—tweaking, asking, correcting, analyzing, adapting, refining, reshaping, etc—before getting to see any code challenging. As would be the case with Waterfall today, the lack of iterative end-to-end feedback is foreign and frustrating to me.
After Claude finally produced a significant amount of code, and after realizing it hadn't built the right thing, I was back to the drawing board to find out what language in the spec had led it astray. Never mind digging through the code at this point; it would be just as good to start again than to try to onboard myself to the 1000s of lines of code it had built... and I suppose the point is to ignore the code as "implementation detail" anyway.
Just to make clear: I love writing code with an LLM, be it for brainstorming, research, or implementation. I often write—and have it output—small markdown notes and plans for it to ground itself. I think I just found this experience with SDD quite heavy-handed and the workflow unwieldy.
I think the challenge is how to create a small but evolvable spec.
What LLMs bring to the picture is that "spec" is high-level coding. In normal coding you start by writing small functions then verify that they work. Similarly LLMs should perhaps be given small specs to start with, then add more functions/features to the spec incrementally. Would that work?
Thanks! With Spec-Kit and Claude Sonnet 4.5, it wanted to design the whole prod-ready CLI up front. It was hard, if not impossible, to try to scope it to just a single feature or POC. This is what I struggled with most.
Were I to try again, I'd do a lot more manual spec writing or even template rewrites. I expected it to work more-or-less out-of-the-box. Maybe it would've for a standard web app using a popular framework.
It was also difficult to know where one "spec" ended and the next began; should I iterate on the existing one or create a new spec? This might be a solved problem in other SDD frameworks besides Spec-Kit, or else I'm just over thinking it!
I did this first too. The trick is realising that the "spec" isn't a full system spec, per se, but a detailed description of what you want to do.
System specs are non trivial for current AI agents. Hand prompting every step is time consuming.
I think (and I am still learning!) SDD sits as a fix for that. I can give it two fairly simple prompts & get a reasonably complex result. It's not a full system but it's more than I could get with two prompts previously.
The verbose "spec" stuff is just feeding the LLMs love of context, and more importantly what I think we all know is you have to tell an agent over and over how to get the right answer or it will deviate.
Early on with speckit I found I was clarifying a lot but I've discovered that was just me being not so good at writing specs!
Example prompts for speckit;
(Specify) I want to build a simple admin interface. First I want to be able to access the interface, and I want to be able to log in with my Google Workspaces account (and you should restrict logins to my workspaces domain). I will be the global superadmin, but I also want a simple RBAC where I can apply a set of roles to any user account. For simplicity let's make a record user accounts when they first log in. The first roles I want are Admin, Editor and Viewer.
(Plan) I want to implement this as a NextJS app using the latest version of Next. Please also use Mantine for styling instead of Tailwind. I want to use DynamoDB as my database for this project, so you'll also need to use Auth.js over Better Auth. It's critical that when we implement you write tests first before writing code; forget UI tests, focus on unit and integration tests. All API endpoints should have a documented contract which is tested. I also need to be able to run the dev environment locally so make sure to localise things like the database.
The plan step is overly focused on the accidental complexity of the project. While the `Specify` part is doing a good job of defining the scope, the `Plan` part is just complicating it. Why? The choice of technology is usually the first step in introducing accidental complexity in a project. Which is why it's often recommended to go with boring technology (so the cost of this technical debt is known). Otherwise go with something that is already used by the company (if it's a side project, do whatever). If you choose to go that route, there's a good chance you're already have good knowledge of those tools and have code samples (and libraries) lying around.
The whole point of code is to be reliable and to help do something that we'd rather not do. Not to exist on its own. Every decision (even little) needs to be connected to a specific need that is tied to the project and the team. It should not be just a receptacle for wishes.
I wouldn't call that accidental complexity? It's just a set of preferences.
Your last point; feels a bit idealistic. The point of code is to achieve a goal, there are ways to achieve with optimal efficiency in construction but a lot of people call that gold plating.
The setup these prompts leave you with is boring, standard, and something surely I can do in a couple of hours. You might even skeleton it right? The thing is the AI can do it both faster in elapsed time but also, reduces my time to writing two prompts (<2 minutes) and some review 10-15 perhaps?
Also remember this was a simple example; once we get to real business logic efficiencies grow.
It may be a set of preferences for now, but it always grow into a monstrosity when future preferences don't align with current preferences. That's what accidental complexity means. Instead of working on the essential needs (having an admin interface that works well), you will get bogged down with the whims of the platform and technology (breaking changes, bugs,...). It may not be relevant to you if you're planning on abandoning it (switching jobs, side project you no longer care,...).
Something boring and standard is something that keeps going with minimal intervention while getting better each time.
I'm going to go out on a limb here and say NextJs with Auth.js is pretty boring technology.
I'm struggling to see what you'd choose to do differently here?
Edit: actually I'll go further and say I'm guiding against accidental complexity. For example Auth.js is really boring technology, but I am annoyed they've deprecated in favour of better Auth - it's not better and it is definitely not boring technology!
I respect this take. As I understand it, in SDD, the code is not the source of truth, it's akin to bytecode; an intermediary between the spec and the observable behavior.
Phone's dead/dropped under a bus tire/stolen at security/got bit-flipped by a cosmic ray, now what?
Nearly 80% of 207M passengers already adopted it means 41M passengers have not.
I apologize in advance for being overly dramatic. I just flew with a digital boarding pass and my phone nearly died while waiting at the outlet-less gate. I'm sure I could have gotten assistance, but it was stressful.
If you lose your phone or if it's out of battery, you'll get a free boarding pass at the airport as long as you checked in online before getting to the airport.
If you didn't check in before, you'll have to pay a 50€ fee. (Same as before)
Empty battery is not unlikely in airports that make it extra-hard to recharge a phone (Looking at you, BER). Long ago it was possible to unplug a vending machine as a last resort, but these days all electrical sockets are concealed, as a measure to prevent overnight stays in airports I believe.
As a high schooler in 2008, I watched my family struggle. Pay cuts, temporary furloughs, and constant stress became normal. The same for my friends and their families. The vestiges of 90s excess and advancement were over.
At that time I realized my American Dream of becoming an engineer was just that: a dream. A shared illusion we all propped up until we couldn't. So, I turned down engineering schools, took a year off to work in a coffee shop, and went to university for a Bachelors in Fine Arts.
I figured: if I was going to be unemployed and living paycheck to paycheck, I might as well follow my own dreams and try to have fun doing it.
Only a few years after graduating, I'd return to engineering—computer programming instead of robotics—but that experience has always stuck with me.
Almost 20 years later, I feel the same gut-punch as I see whats happening to young people.
ITY621 does a readback: "...climb DLREY," meaning they confirmed the departure path for 24L extends forward before heading left; which is different for 25R, which heads left shortly after takeoff: "...RNAV DOCKR."
https://docs.ruby-lang.org/en/master/box_md.html
reply