More

DrammBA · 2026-01-13T08:34:35 1768293275

Do you have any source for that claim? I'm curious and worried.

DrammBA · 2026-01-12T05:03:38 1768194218

You focused on writing software, but the real problem is the spec used to produce the software, LLMs will happily hallucinate reasonable but unintended specs, and the checker won’t save you because after all the software created is correct w.r.t. spec.

Also tests and proof checkers only catch what they’re asked to check, if the LLM misunderstands intent but produces a consistent implementation+proof, everything “passes” and is still wrong.

simonw · 2026-01-12T05:04:54 1768194294

This is why every one of my coding agent sessions starts with "... write a detailed spec in spec.md and wait for me to approve it". Then I review the spec, then I tell it "implement with red/green TDD".

tsimionescu · 2026-01-12T06:53:52 1768200832

The premise was that the AI solution would replace the engineering team, so who exactly is writing/reviewing this detailed spec?

eru · 2026-01-12T08:11:01 1768205461

Well, perhaps it'll only shrink the engineering team by 95% then.

LouisSayers · 2026-01-12T11:11:27 1768216287

Why would you shrink the team rather than become 20x more productive as a whole?

daxfohl · 2026-01-14T04:03:54 1768363434

Users don't want changes that rapidly. There's not enough people on the product team to design 20x more features. 20x more features means 400x more cross-team coordination. There's only positive marginal ROI for maybe 1.5-2x even if development is very cheap.

eru · 2026-01-12T13:01:49 1768222909

Either way can work. It depends on what the rest of the business needs.

PurpleRamen · 2026-01-12T11:57:07 1768219027

The premise is in progress. We are only at the beginning of the fourth year of this hype-phase, and we haven't even reached AGI yet. It's obviously not perfect, maybe never will, but we are not a the point yet were we can conclude which future is true. The singularity hasn't happend yet, so we are still moving with (llm-enhanced) human speed at the moment, meaning things need time.

simonw · 2026-01-12T06:55:45 1768200945

That's a bad premise.

tsimionescu · 2026-01-12T07:01:12 1768201272

Maybe, but you're responding to a thread about why AI might or might not be able to replace an entire engineering team:

> Ultimately I think over the next two years or so, Anthropic and OpenAI will evolve their product from "coding assistant" to "engineering team replacement", which will include standard tools and frameworks that they each specialize in (vendor lock in, perhaps), but also ways to plug in other tech as well.

This is the context of how this thread started, and this is the context in which DrammBA was saying that the spec problem is very hard to fix [without an engineering team].

matwood · 2026-01-12T08:08:39 1768205319

Might be good to define the (legacy) engineering team. Instead of thinking 0/1 (ugh, almost nothing happens this way), the traditional engineering team may be replaced by something different. A team mostly of product, spec writers, and testers. IDK.

galaxyLogic · 2026-01-12T08:42:30 1768207350

The job of AI is to do what we tell it to do. It can't "create a spec" on its own. If it did and then implemented that spec, it wouldn't accomplish what we want it to accomplish. Therefore we the humans must come up with that spec. And when you talk about a software application, the totality of its spec written out, can be very complex, very complicated. To write and understand, and evolve and fix such a spec takes engineers, or what used to be called "system analysts".

To repeat: To specify what a "system" we want to create does is a highly complicated task, which can only be dones by human engineers who understand the requirements for the system, and how parts of those requirements/specs interact with other parts of the spec, what are the consequences of one (part of the) spec to other parts of it. We must not writ e"impossible specs" like draw me a round square. Maybe the AI can check whether the spec is impossible or not, but I'm not so sure of that.

So I expect that software engineers will still be in high demand, but they will be much more productive with AI than without it. This means there will be much more software because it will be cheaper to produce. And the quality of the software will be higher in terms of doing what humans need it to do. Usability. Correctness. Evolvability. In a sense the natural language-spec we give the AI is really something written in a very high-level programming-language - the language of engineers.

BTW. As I write this I realize there is no spell-checker integrated into Hacker News. (Or is there?). Why? Because it takes developers to specify and implement such a system - which must be integrated into the current HN implementation. If AI can do that for HN, it can be done, because it will be cheap enough to do it -- if HN can exactly spell out what kind of system it wants. So we do need more software, better software, cheaper software, and AI will helps us do that.

A 2nd factor is that we don't really know if a spec is "correct" until we test the implemented system with real users. At that point we typically find many problems with the spec. So somebody must fix the problems with the spec, evolve the spec and rinse and repeat the testing with real users -- the developers who understand the current spec and why it is is not good enough.

AI can write my personal scripts for me surely. But writing a spec for a system to be used by thousands of humans, still takes a lot of (human) work. The spec must work for ALL users. That makes it complicated and difficult to get right.

daxfohl · 2026-01-12T06:16:50 1768198610

Same, and similarly something like a "create a holistic design with all existing functionality you see in tests and docs plus new feature X, from scratch", then "compare that to the existing implementation and identify opportunities for improvement, ranked by impact, and a plan to implement them" when the code starts getting too branchy. (aka "first make the change easy, then make the easy change"). Just prompting "clean this code up" rarely gets beyond dumb mechanical changes.

Given so much of the work of managing these systems has become so rote now, my only conclusion is that all that's left (before getting to 95+% engineer replacement) is an "agent engineering" problem, not an AI research problem.

DrammBA · 2026-01-11T03:24:52 1768101892

Are the nightly releases the expected way to get timely bugfixes?

amazingman · 2026-01-11T06:36:29 1768113389

That is how software releases generally work. AFAICT this is not a bug with broad impact or security implications.

fartfeatures · 2026-01-11T15:29:22 1768145362

I guess thats arguable, a memory leak can make a system unpleasant to use although I accept it can be solved by repeatedly restarting the offending app.

msephton · 2026-01-15T05:21:43 1768454503

Yes, if you want fixes as soon as they're committed, rather than waiting for a more regular release that might be tested and more stable.

DrammBA · 2026-01-10T18:03:14 1768068194

Without getting into your specific injury or sport, what was the biggest change compared to the trainer’s program?

Was it something unexpected like "exercise this seemingly unrelated muscle group that has nothing do with your injury but just happens to reduce pain by 75% for some inexplicable reason"?

Or was it something more mundane like "instead of exercising this muscle every day, do it every other day to give it time to rest"?

bwb · 2026-01-11T09:50:26 1768125026

Good question!

I'm not entirely sure, but here is my educated guess.

The biggest change was that I spent a lot of time vetting each exercise for my specific injury points and asking whether this was really the best way to work that muscle group. I ended up replacing 60% of the workout with new exercises that allow me to lift more weight or target different muscle groups, while taking pressure off those injury points.

I think I had grown to use more weight with a few exercises that, on paper, shouldn't cause a problem, but were causing more stress on my injury and the supporting muscles. I found ways to isolate those muscles without putting as much tension on that area. I also added more core-strength exercises, including some for the hip flexors, which might be helping support as well. I was likely doing planks for too long, and switched to hardstyle, etc.

Last year, I was pain-free 90% of the year, and most years I run around 95% to 98%. Last year just felt different, and the rehab wasn't working the way it was. Since switching to this workout about 8 weeks ago I've been 100% pain free in a way that is hard to describe. My back has just felt light and happy, I can jump up on boxes and back down with no worries.

This is on the back of 10 years of rehab, 10 years of education, 10 years of learning about my injury and body, etc. AI is not some magic button to all the people who might jump on this thread :), it's a tool, and I want to stress that. But I've tried to do this in years past, and I couldn't do it. This was a game-changer. I tred with ChatGPT3 and it was useless at the time as well.

DrammBA · 2026-01-09T04:26:54 1767932814

Funny thing I went down a rabbit hole, cause I first scanned the open PRs and saw a PR to enable universal builds to support intel macs but the whole thing was pure AI slop and someone commented that codexbar already supports intel, and sure enough v.15 added it (the AI slop PR completely missed that), I then looked into the cask script and it has a hardcoded dependency on arm which prevents brew from installing v.17 even if it's already an universal binary since v.15.

touristtam · 2026-01-09T10:23:52 1767954232

I came to the same conclusion before making that snarky and apparently offensive comment. Live to learn I guess.

DrammBA · 2026-01-09T19:24:31 1767986671

You mean "after" making the comment? Your offensive comment makes no sense if you already had the universal build context

DrammBA · 2026-01-09T03:53:23 1767930803

I loved the screenshot section of the readme followed by zero screenshots.

DrammBA · 2026-01-06T06:55:33 1767682533

> Today at CES, Intel unveiled Intel Core Ultra Series 3 processors, the first AI PC platform built on Intel 18A process technology that was designed and manufactured in the United States. Powering over 200 designs from leading, global partners, Series 3 will be the most broadly adopted and globally available AI PC platform Intel has ever delivered.

What in the world is this disaster of an opening paragraph? From the weird "AI PC platform" (not sure what that is) to the "will be the most broadly adopted and globally available AI PC platform" (is that a promise? a prediction? a threat?).

And you just gotta love the processor names "Intel Core Ultra Series 3 Mobile X9/X7"

jmward01 · 2026-01-06T07:09:46 1767683386

I think I have given up on chip naming. I honestly can't tell anymore there are so many modifiers on the names these days. I assume 9 is better than 7 right? Right?

chrismorgan · 2026-01-06T07:48:12 1767685692

> I assume 9 is better than 7 right? Right?

Oh, the number of times I’ve heard someone assume their five- or ten-year-old machine must be powerful because it’s an i7… no, the i3-14100 (released two years ago) is uniformly significantly superior to the i7-9700 (released five years before that), and only falls behind the i9-9900 in multithreaded performance.

Within the same product family and generation, I expect 9 is better than 7, but honestly it wouldn’t surprise me to find counterexamples.

gambiting · 2026-01-06T08:08:23 1767686903

>>Within the same product family and generation, I expect 9 is better than 7

Ah the good old Dell laptop engineering, where the i9 is better on paper, but in reality it throttles within 5 seconds of starting any significant load and the cpu nerfs itself below even i5 performance. Classic Dell move.

stefanfisk · 2026-01-06T09:27:23 1767691643

Apple had the same problem before they launched the M1. Unless your workloads are extremely bursty the i9 MacBook is almost guaranteed to be slower than the base i7.

zozbot234 · 2026-01-06T17:12:26 1767719546

The latest iPhone base model performs better than the iPhone Air despite the latter having a Pro chip, because that Pro is so badly throttled due to the device form factor.

ZiiS · 2026-01-06T14:41:58 1767710518

Even thier ultra efficent silicon didn't fully solve this; a 16" M4 Pro often outperforms a 14" M4 Max stuck throttling.

flyinglizard · 2026-01-06T15:37:12 1767713832

Are they throttling with the fan off? Because I don't recall ever hearing the fan on my M3 Max 14" (granted no heavy deliberate computational beyond regular dev work).

ZiiS · 2026-01-06T19:49:44 1767728984

No this shows up when you really fully load them and the fans can't keep up. Most people never do, but then why buy the Max?

stefanfisk · 2026-01-06T17:59:38 1767722378

AFAIK it’s only something that happens under sustained heavy load. The 14” Max should still outperform the Pro for shorter tasks but I’d reckon few people buy the most expensive machine for such use cases.

Personally I think that Apple should not even be selling the 14” Max when it has this defect.

MBCook · 2026-01-06T16:55:40 1767718540

I can’t comment on that.

But at least you always know an A7 is better than an A6 or an A4. The M4 is better than the M3 and M1.

The suffixes make it more complicated, but at least within a suffix group the rule still holds.

ZiiS · 2026-01-07T14:33:53 1767796433

But if you buy a Mac Studio today, you have to choose a M4 Max or a much faster M3 Ultra.

MBCook · 2026-01-08T20:36:52 1767904612

The Ultra isn’t always faster.

Apple’s spotty record on the Ultras (they haven’t released one every generation) makes things harder for comparisons without looking at benchmarks.

zuhsetaqi · 2026-01-06T17:16:13 1767719773

First time I’m hearing this. Do you have any sources on this?

christkv · 2026-01-06T16:24:44 1767716684

I still have the i9 macbook pro and its a dog for sure throttles massively

chrismorgan · 2026-01-06T08:21:57 1767687717

Within the same family and generation, I don’t think this should happen any more. But especially in the past, some laptops were configurable with processors of different generations or families (M, Q, QM, U, so many possibilities) so that the i7 option might have worse real-world performance than the i5 (due to more slower cores).

tracker1 · 2026-01-06T16:27:18 1767716838

It's been a cooling problem on a lot of i9 laptops... the CPU will hit thermal peaks, then throttle down, this has an incredibly janky feel as a user... then it spins back up, and down... the performance curves just wacky in general.

Today is almost worse, as the thermal limits will be set entirely different between laptop vendors on the same chips, so you can't even have apples to apples performance expectations from different vendors.

tracker1 · 2026-01-06T16:24:38 1767716678

Same for the later generation Intel Macbook Pros... The i9 was so bad, and the throttling made it practically unusable for me. If it weren't a work issued laptop, I'd have either returned it, or at least under-volted and under-clocked it so it didn't hiccup every time I did anything at all.

dehrmann · 2026-01-06T17:07:51 1767719271

I had an X1 Carbon like this, only it'd crash for no apparent reason. The internet consensus that Lenovo wouldn't own up to was that the i7 CPUs were overpowered for the cooling, so your best bet is either underthrottling them or getting an i5.

mrandish · 2026-01-06T18:20:03 1767723603

Yeah, putting an i9 in any laptop that's not an XL gaming rig with big fans is very nearly always a waste of money (there might exist a few rare exceptions for some oddball workloads). Manufacturers selling i9s in thin & light laptops at an ultra price premium may fall just short of the legal definition of fraud but it's as unconscionable as snake-oil audiophile companies selling $500 USB cables.

gambiting · 2026-01-06T20:57:34 1767733054

Tbf 2 jobs ago I had a Dell enterprise workstation laptop, an absolute behemoth of a thing, it was like 3.5kg, it was the thicker variant of the two available with extra cooling, specifically sold to companies like ours needing that extra firepower, and it had a 20 core i9, 128GB of DDR5 CAMM ram, and a 3080Ti - I think the market price of that thing was around £14k, it was insane. And it had exactly that kind of behaviour I described - I would start compiling something in Visual Studio, I would briefly see all cores jump to 4GHz and then immediately throttle down to 1.2GHz, to a point where the entire laptop was unresponsive while the compilation was ongoing. It was a joke of a machine - I think that's more of a fraud than what you described, because companies like ours were literally buying hundreds of these from Dell and they were literally unsuitable for their advertised use.

(to add insult to the injury - that 3080Ti was literally pointless as the second you started playing any game the entire system would throttle so hard you had extreme stuttering in any game, it was like driving a lamborghini with a 5 second fuel reserve. And given that I worked at a games studio that was kinda an essential feature).

wtallis · 2026-01-06T18:25:57 1767723957

That's still assigning too much significance to the "i9" naming. Sometimes, the only difference between the i9 part and the top i7 part was something like 200MHz of single-core boost frequency, with the core counts and cache sizes and maximum power limit all being equal. So quite often, the i7 has stood to gain just as much from a higher-power form factor as the i9.

avadodin · 2026-01-06T08:57:53 1767689873

A machine learning model can place a CPU on the versioning manifold but I'm not confident that it could translate it to human speech in a way that was significantly more useful than what we have now.

At best, 14700KF-Intel+AMD might yield relevant results.

cherioo · 2026-01-06T08:17:13 1767687433

AI PC has been in the buzz for more than 2 years now (despite itself being a near useless concept), and intel has like 75% marketshare for laptop. Both of those are well with in norm for an intel marketing piece.

It’s not really meant for consumer. Who would even visit newsroom.intel.com?

lostlogin · 2026-01-06T08:27:18 1767688038

Apparently it’s been a thing for a while:

What is an AI PC? ('Look, Ma! No Cloud!')

An AI PC has a CPU, a GPU and an NPU, each with specific AI acceleration capabilities. An NPU, or neural processing unit, is a specialized accelerator that handles artificial intelligence (AI) and machine learning (ML) tasks right on your PC instead of sending data to be processed in the cloud. https://newsroom.intel.com/artificial-intelligence/what-is-a...

sidewndr46 · 2026-01-06T14:24:22 1767709462

It'd be interesting to see some market survey data showing the number of AI laptops sold & the number of users that actively use the acceleration capabilities for any task, even once.

sixothree · 2026-01-06T14:53:24 1767711204

I'm not sure I've ever heard of a single task that comes built into the system and uses the NPU.

fassssst · 2026-01-06T15:07:36 1767712056

Remove background from an image. Summarize some text. OCR to select text or click links in a screenshot. Relighting and centering you in your webcam. Semantic search for images and files.

A lot of that is in the first party Mac and Windows apps.

lostlogin · 2026-01-06T16:13:15 1767715995

Selecting text in a photo is a game changer. I love it.

MBCook · 2026-01-06T16:57:24 1767718644

Wasn’t built in OCR an amazing feature?

We probably could have done it years earlier. But when it showed up… wow.

olyjohn · 2026-01-06T23:54:12 1767743652

CES stands for Consumer Electronics Show last I checked.

octoberfranklin · 2026-01-06T07:43:52 1767685432

Laptop names are even worse:

> Are ZBooks good or do I want an OmniBook or ProBook? Within ZBook, is Ultra or Fury better? Do I want a G1a or a G1i? Oh you sell ZBook Firefly G11, I liked that TV show, is that one good?

https://geohot.github.io/blog/jekyll/update/2025/11/29/bikes...

jhickok · 2026-01-06T15:59:35 1767715175

TIL Geohot pretty much want the exact same thing in a laptop. Basically a Macbook Pro running Linux.

lostlogin · 2026-01-06T08:31:13 1767688273

And that root of all that shit lies Apple and the ‘book’ suffix.

kergonath · 2026-01-06T08:41:08 1767688868

Apple is very consistent. You have the MacBook Air (lighter, more portable variant) and the MacBook Pro (more expensive and powerful variant). They don’t mess around with model numbers.

yencabulator · 2026-01-06T15:29:07 1767713347

Apple is so "consistent" the way to know which kind of an Air or Pro it is, is to find the tiny print on the bottom that's a jumble of letters like "MGNE3" and google it.

And depending on what you're trying to use it for, you need to map it to a string like "MacBookAir10,1" or "A2337" or "Macbook Air Late 2022".

Oh also the Macbook Air (2020) is a different processor architecture than Macbook Air (2020).

kergonath · 2026-01-06T16:13:51 1767716031

The canonical way if you need a version number is the "about this Mac" dialog (here it says Mac Studio 2022).

If you need to be technical, System Information says Mac13,1 and these identifiers have been extremely consistent for about 30 years.

Your product number encodes much more information than that, and about the only time when it is actually required is to see whether it is eligible for a recall.

> Oh also the Macbook Air (2020) is a different processor architecture than Macbook Air (2020).

Right, except that one is MacBook Air (retina, 2020), Macbookair9,1, and the other is MacBook Air (M1, 2020), MacBookAir10,1. It happens occasionally, but the fact that you had to go back 5 years to a period in which the lineup underwent a double transition speaks volume.

lostlogin · 2026-01-06T09:26:30 1767691590

> Apple is very consistent. You have the MacBook Air (lighter, more portable variant) and the MacBook Pro (more expensive and powerful variant).

What about the iBook? That wasn’t tidy. Ebooks or laptops?

Or the iPhone 9? That didn’t exist.

Or MacOS? Versioning got a bit weird after 10.9, due the X thing.

They do mess around with model numbers and have just done it again with the change to year numbers. I don’t particularly care but they aren’t all clean and pure.

https://daringfireball.net/linked/2025/05/28/gurman-version-...

kergonath · 2026-01-06T12:38:50 1767703130

> What about the iBook? That wasn’t tidy. Ebooks or laptops?

Back then, there were iBooks (entry-level) and PowerBooks (professional, high performance and expensive). There had been PowerBooks since way back in 1991, well before any ebook reader. I am not sure what your gripe is.

> Or the iPhone 9? That didn’t exist.

There’s a hole in the series. In what way is it a problem, and how on earth is it similar to the situation described in the parent?

> Or MacOS? Versioning got a bit weird after 10.9, due the X thing.

It never got weird. After 10.9.5 came 10.10.0. Version numbers are not decimals.

Seriously, do you have a point apart from "Apple bad"?

lostlogin · 2026-01-06T15:55:46 1767714946

You were saying that Apple is very consistent. I’m pointing out they aren’t particularly.

> It never got weird. After 10.9.5 came 10.10.0. Version numbers are not decimals.

They turned one of the numbers into a letter then started numbering again.

There was Mac OS 9, then Mac OS X. That got incremented up past 10.

You say they don’t mess around with model numbers. Yes they do, with software and hardware.

I like using them both.

kergonath · 2026-01-06T16:28:39 1767716919

> They turned one of the numbers into a letter then started numbering again.

They did not. It has been MacOS X 10.0 through macOS 10.15. In never was X.1 or anything like that.

MBCook · 2026-01-06T17:01:23 1767718883

Right. MacOS X was the marketing name. But it was pronounced 10, just a stylization with Roman numerals.

The version number the OS reported always said 10.whatever. Exactly as you said.

kergonath · 2026-01-06T20:26:04 1767731164

Yes, and you did sound silly when saying it out loud the official way (OS ten ten ten was a famous one, for Yosemite).

lostlogin · 2026-01-07T07:33:53 1767771233

I stand corrected. I thought the X(10) was part of the version number, not a prefix that got added.

MBCook · 2026-01-07T18:20:22 1767810022

I’m not sure I hear people call MacOS X 10.10 “ten ten ten”. I think I remember them calling it “ten ten” verbally.

So you’d say “MacOS ten ten”.

At least that’s what I’m used to, it is entirely possible that’s what other people said and you would write it that way. No one wrote “MacOS X.10” or “MacOS X .10” but they would write “MacOS X 10.10”.

So yeah it’s all a bit of a mess. There’s a reason people often use the name of the release, like Snow Leopard or Tahoe, instead of the number numbers.

stefanfisk · 2026-01-06T09:29:03 1767691743

It was a response to you specifically calling out the book suffix.

And what was unclear iBook VS PowerBook?

lostlogin · 2026-01-06T09:32:40 1767691960

The iBook store.

Sorry, I thought you were saying that they don’t use model numbers at all.

I think you were actually saying that they don’t just them for laptops.

wtallis · 2026-01-06T15:26:31 1767713191

"iBook" referred to a laptop from 1999 to 2006. "iBooks" referred to the eBook reader app and store from 2010 to 2019. I'll grant that there is some possibility for confusion, but only if the context of the conversation spans multiple decades but doesn't make it clear whether you're talking about hardware or software.

bebna · 2026-01-06T09:57:20 1767693440

I got a MacBook. No, not an air or pro, just MacBook.

kergonath · 2026-01-06T12:44:50 1767703490

Back when there were MacBooks, it was MacBook (standard model), MacBook Air (lighter variant), and MacBook Pro (more expensive, high-performance variant). Sure, 3 is more complicated than 2, but come on.

If you really want to complain, you can go back to the first unibody MacBook, which did not fit that pattern, or the interim period when high-DPI displays were being rolled out progressively, but let’s be serious. The fact is that even at the worst of times their range could be described in 2 sentences. Now, try to do that for any other computer brand. To my knowledge, he only other with an understandable lineup was Microsoft, before they lost interest.

lostlogin · 2026-01-06T16:12:10 1767715930

> The fact is that even at the worst of times their range could be described in 2 sentences.

It’s a good time to buy one. They are all good.

It would be interesting to know how many SKUs are hidden behind the simple purchase interface on their site. With the various storage and colour options, it must be over 30.

kergonath · 2026-01-06T20:23:12 1767730992

Loads, I assume. But those are things like "MacBook Pro M1 Max with a 1TB SSD and a matte screen coating" versus "MacBook Pro M1 with a 256GB SSD and a standard screen". The granularity of say Dell’s product numbers is not enough for that either, and you still need a long product number when searching their knowledge base.

librasteve · 2026-01-06T10:15:03 1767694503

waiting for a MacBook Vapour

edgineer · 2026-01-06T08:43:00 1767688980

Apple did not invent the -book suffix for model names of notebook computers.

lostlogin · 2026-01-06T09:37:28 1767692248

Thanks - I didn’t know that.

Looks like it was Notebook in 1982 and Dynabook after that.

https://en.wikipedia.org/wiki/Notebook_computer

dangus · 2026-01-06T07:08:07 1767683287

Intel marketing isn’t the best but I am struggling to understand what issue you’re taking with this.

It’s an AI PC platform. It can do AI. It has an NPU and integrated GPU. That’s pretty straightforward. Competitors include Apple silicon and AMD Ryzen AI.

They’re predicting it’ll sell well, and they have a huge distribution network with a large number of partner products launching. Basically they’re saying every laptop and similar device manufacturer out there is going to stuff these chips in their systems. I think they just have some well-placed confidence in the laptop segment, because it’s supposed to combine the strong efficiency of the 200 series with the kind of strong performance that can keep up with or exceed competition from AMD’s current laptop product lineup.

Their naming sucks but nobody’s really a saint on that.

webdevver · 2026-01-06T08:18:19 1767687499

i cant believe we're still putting NPUs into new designs.

silicon taken up that couldve been used for a few more compute units on the GPU, which is often faster at inference anyway and way more useful/flexible/programmable/documented.

zmb_ · 2026-01-06T11:33:12 1767699192

You can thank Microsoft for that. Intel architects in fact did not want to waste area on an NPU. That caused Microsoft to launch their AI-whatever branded PCs with Qualcomm who were happy to throw in whatever Microsoft wanted to get to be the launch partner. After than Intel had to follow suit to make Microsoft happy.

dangus · 2026-01-06T13:34:15 1767706455

That doesn’t explain why Apple “wastes” die area on their NPU.

The thing is, when you get an Apple product and you take a picture, those devices are performing ML tasks while sipping battery life.

Microsoft maybe shouldn’t be chasing Apple especially since they don’t actually have any marketshare in tablets or phones, but I see where they’re getting at: they are probably tired of their OS living on devices that get half the battery life of their main competition.

And here’s the thing, Qualcomm’s solution blows Intel out of the water. The only reason not to use it is because Microsoft can’t provide the level of architecture transition that Apple does. Apple can get 100% of their users to switch architecture in about 7 years whenever they want.

cromka · 2026-01-06T08:32:56 1767688376

Guess they're following Apple here whose NPUs get all the support possible, as far as I can tell.

dangus · 2026-01-06T13:38:50 1767706730

Bingo. Maybe Microsoft shouldn’t even be chasing them but I think they have a point to try and stay competitive. They can’t just have their OS getting half the battery life of their main competitor.

When you use an Apple device, it’s performing ML tasks while barely using any battery life. That’s the whole point of the NPU. It’s not there to outperform the GPU.

astrange · 2026-01-06T18:49:17 1767725357

NPUs aren't designed to be "faster", they are designed to have better perf/power ratios.

stockresearcher · 2026-01-06T13:43:34 1767707014

Every modern chip needs some percentage dedicated to dark silicon. There is no cheating the thermal reality. You could add more compute units in the GPU, but you then have to make up for it somewhere else. It’s a balancing act.

The Core Ultra lineup is supposed to be low-power, low-heat, right? If you want more compute power, pick something from a different product series.

wtallis · 2026-01-06T15:56:03 1767714963

> Every modern chip needs some percentage dedicated to dark silicon. There is no cheating the thermal reality. You could add more compute units in the GPU, but you then have to make up for it somewhere else. It’s a balancing act.

I think that "dark silicon" mentality is mostly lingering trauma from when the industry first hit a wall with the end of Dennard scaling. These days, it's quite clear that you can have a chip that's more or less fully utilized, certainly with no "dark" blocks that are as large as a NPU. You just need to have the ability to run the chip at lower clock speeds to stay within power and thermal constraints—something that was not well-developed in 2005's processors. For the kind of parallel compute that GPUs and NPUs tackle, adding more cores but running them at lower clock speeds and lower voltages usually does result in better efficiency in practice.

The real answer to the GPU vs NPU question isn't that the GPU couldn't grow, but that the NPU has a drastically different architecture making very different power vs performance tradeoffs that theoretically give it a niche of use cases where the NPU is a better choice than the GPU for some inference tasks.

CyberDildonics · 2026-01-06T14:22:42 1767709362

It's a disaster along with the title. There isn't a lot of clear information.

hnuser123456 · 2026-01-06T15:00:30 1767711630

It means they did cost cutting on Lunar Lake and are excited to sell a lot of them at similar or higher prices.

ajross · 2026-01-06T15:17:13 1767712633

> cost cutting on Lunar Lake

It's... the launch vehicle for a new process. Literally the opposite of "cost cutting", they went through the trouble of tooling up a whole fab over multiple years to do this.

Will 18A beat TSMC and save the company? We don't know. But they put down a huge bet that it would, and this is the hand that got dealt. It's important, not something to be dismissed.

hnuser123456 · 2026-01-06T16:24:18 1767716658

Lunar Lake integrated DRAM on the package, which was faster and more power efficient, this reverts that. They also replaced part of the chip from being sourced from TSMC to from themselves. And if their foundry is competitive, they should be shaking other foundry customers down the way TSMC is.

If they have actually mostly caught up to TSMC, props, but also, I wish they hadn't given up on EUV for so long. Instead they decided to ship chips overclocked so high they burn out in months.

ac29 · 2026-01-06T17:01:12 1767718872

> Lunar Lake integrated DRAM on the package, which was faster and more power efficient, this reverts that.

On package memory is slightly more power efficient but it isnt any faster, it still uses industry standard LPDDR. And Panther Lake supports faster LPDDR than Lunar Lake, so its definitely not a regression.

ajross · 2026-01-06T16:34:35 1767717275

I don't see how any of that substantiates "Panther Lake and 18A are just cost cutting efforts vs. Lunar Lake". It mostly just sounds like another boring platform flame.

hnuser123456 · 2026-01-06T16:46:14 1767717974

I'll let Intel speak for themselves here:

https://www.tomshardware.com/pc-components/cpus/lunar-lakes-...

ajross · 2026-01-06T20:31:13 1767731473

Again, you're talking about the design of Panther Lake, the CPU IC. No one cares, it's a CPU. The news here is the launch of the Intel 18A semiconductor process and the discussion as to if and how it narrows or closes the gap with TSMC.

Trying to play this news off as "only cost cutting" is, to be blunt, insane. That's not what's happening at all.

Tostino · 2026-01-06T20:45:05 1767732305

I'm not GP, but I think that it really doesn't matter if Intel is able to sell this process to other companies. But if they're only producing their own chips on it, that's quite a valid criticism.

ajross · 2026-01-06T22:02:36 1767736956

And for the fourth time, it may be a valid "criticism" in the sense of "Does Intel Suck or Rule?". It does not validate the idea that this product release, which introduces the most competitive process from this company in over a decade, is merely a "cost reduction" change.

hnuser123456 · 2026-01-07T14:25:21 1767795921

It's only as exciting as a cost reduction because they're playing catch-up by trying to not need to outsource their highest performance silicon. Let me know when Intel gets perf/watt to be high enough to be of interest to Apple, gamers, or anyone who isn't just buying a basic PC because their old one died, or an Intel server because that's what they've always had.

Every single performance figure in TFA is compared to their own older generations, not to competitors.

etempleton · 2026-01-06T17:08:59 1767719339

Cost cutting? 18a probably has more invested in it then every other process Intel has ever produced combined.

DrammBA · 2026-01-06T03:55:58 1767671758

> completely free, and there is no advertising or hidden gotchas

I don't understand why not release the source if the app is completely free, what are you trying to protect?

xarope · 2026-01-06T04:02:59 1767672179

Putting on my CISO hat, if they release the source, someone else could then create an app, but this time maliciously with said exfiltration of information, and publish it on play with paid ad time.

upcoming-sesame · 2026-01-06T07:09:17 1767683357

they probably just wanna keep the option to monetize it in the future open

DrammBA · 2026-01-04T23:02:22 1767567742

You seem to be confused about your terms, both SSR and SSG can rehydrate and become interactive, you only need SSR if you have personalized content that must be fetched on an actual user request, and with frameworks like astro introducing island concept it even let's you mix SSG and SSR content on a single page.

user34283 · 2026-01-05T09:30:56 1767605456

That depends on how you interpret "static render".

I did not interpret that as React SSG. SSG is the default behavior of NextJS unless you dynamically fetch data, turning it into SSR automatically.

What I thought of is React's "renderToString()" at build time which will produce static HTML with event handlers stripped, in preparation for a later "hydrateRoot()" on the client side.

DrammBA · 2026-01-05T19:06:47 1767640007

> That depends on how you interpret "static render".

It only depends if you interpret it incorrectly.

DrammBA · 2025-12-30T01:38:25 1767058705

Simon, you're starting to sound super disconnected from reality, this "I hit everything that looks like a nail with my LLM hammer" vibe is new.

simonw · 2025-12-30T01:42:00 1767058920

My habits have changed quite a bit with Opus 4.5 in the past month. I need to write about it..

godelski · 2025-12-30T02:59:25 1767063565

What's concerning to many of us is that you've (and others) have said this same thing s/Opus 4.5/some other model/

That feels more like chasing than a clear line of improvement. It's interrupted very different from something like "my habits have changed quite a bit since reading The Art of Computer Programming". They're categorically different.

mkozlows · 2025-12-30T06:53:36 1767077616

It's because the models keep getting better! What you could do with GPT-4 was more impressive than what you could do with GPT 3.5. What you could do with Sonnet 3.5 was more impressive yet, and Sonnet 4, and Sonnet 4.5.

Some of these improvements have been minor, some of them have been big enough to feel like step changes. Sonnet 3.7 + Claude Code (they came out at the same time) was a big step change; Opus 4.5 similarly feels like a big step change.

(If you don't trust vibes, METR's task completion benchmark shows huge improvements, too.)

If you're sincerely trying these models out with the intention of seeing if you can make them work for you, and doing all the things you should do in those cases, then even if you're getting negative results somehow, you need to keep trying, because there will come a point where the negative turns positive for you.

If you're someone who's been using them productively for a while now, you need to keep changing how you use them, because what used to work is no longer optimal.

godelski · 2025-12-30T09:37:20 1767087440

Models keep getting better but the argument I'm critiquing stays the same.

So does the comment I critiqued in the sibling comment to yours. I don't know why it's so hard to believe we just haven't tried. I have a Claude subscription. I'm an ML researcher myself. Trust me, I do try.

But that last part also makes me keenly aware of their limitations and failures. Frankly I don't trust experts who aren't critiquing their field. Leave the selling points to the marketing team. The engineer and researcher's job is to be critical. To find problems. I mean how the hell do you solve problems if you're unable to identify them lol. Let the marketing team lead development direction instead? Sounds like a bad way to solve problems

  > benchmark shows huge improvements

Benchmarks are often difficult to interpret. It is really problematic that they got incorporated into marketing. If you don't understand what a benchmark measures, and more importantly, what it doesn't measure, then I promise you that you're misunderstanding what those numbers mean.

For METR I think they say a lot right here (emphasis my own) that reinforces my point

  > Current frontier AIs are vastly better than humans at text prediction and knowledge tasks. They outperform experts on most *exam-style problems* for a fraction of the cost. ... And yet the best AI agents are not currently able to carry out substantive projects by themselves or directly substitute for human labor. *They are unable to reliably handle even relatively low-skill*, computer-based work like remote executive assistance. It is clear that capabilities are increasing very rapidly in some sense, but it is unclear how this corresponds to real-world impact.

So make sure you're really careful to understand what is being measured. What improvement actually means. To understand the bounds.

It's great that they include longer tasks but also notice the biases and distribution in the human workers. This is important in properly evaluating.

Also remember what exactly I quoted. For a long time we've all known that being good at leetcode doesn't make one a good engineer. But it's an easy thing to test and the test correlates with other skills that are likely to be learned to be good at those tests (despite being able to metric hack). We're talking about massive compression machines. That pattern match. Pattern matching tends to get much more difficult as task time increases but this is not a necessary condition.

Treat every benchmark adversarialy. If you can't figure out how to metric hack it then you don't know what a benchmark is measuring (and just because you know what can hack it doesn't mean you understand it nor that that's what is being measured)

mkozlows · 2025-12-30T15:40:44 1767109244

I think you should ask yourself: If it were true that 1) these things do in fact work, 2) these things are in fact getting better... what would people be saying?

The answer is: Exactly what we are saying. This is also why people keep suggesting that you need to try them out with a more open mind, or with different techniques: Because we know with absolute first-person iron-clad certainty what is possible, and if you don't think it's possible, you're missing something.

nl · 2025-12-30T10:11:07 1767089467

I don't understand what your argument is.

It seems to be "people keep saying the models are good"?

That's true. They are.

And the reason people keep saying it is because the frontier of what they do keeps getting pushed back.

Actual, working, useful code completion in the GPT 4 days? Amazing! It could automatically write entire functions for me!

The ability to write whole classes and utility programs in the Claude 3.5 days? Amazing! This is like having a junior programmer!

And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!

But now we are beginning to see that programming in 6 months time might look very different to now because these AI system code very differently to us. That's exactly the point.

So what is it you are arguing against?

I think you said you didn't like that people are saying the same thing, but in this post it seems more complicated?

timr · 2025-12-30T14:50:07 1767106207

> And now, with Opus 4.5 or Codex Max or Gemini 3 Pro we can write substantial programs one-shot from a single prompt and they work. Amazing!

People have been doing this parlor trick with various "substantial" programs [1] since GPT 3. And no, the models aren't better today, unless you're talking about being better at the same kinds of programs.

[1] If I have to see one more half-baked demo of a running game or a flight sim...

simonw · 2025-12-30T16:11:27 1767111087

"And no, the models aren't better today"

Can you expand on that? It doesn't match my experience at all.

timr · 2025-12-30T16:36:16 1767112576

It’s a vague statement that I obviously cannot defend in all interpretations, but what I mean is: the performance of models at making non-trivial applications end-to-end, today, is not practically better than it was a few years ago. They’re (probably) better at making toys or one-shotting simple stuff, and they can definitely (sometimes) crank out shitty code for bigger apps that “works”, but they’re just as terrible as ever if you actually understand what quality looks like and care to keep your code from descending into entropy.

I think "substantial" is doing a lot of heavy lifting in the sentence I quoted. For example, I’m not going to argue that aspects of the process haven’t improved, or that Claude 4.5 isn't better than GPT 4 at coding, but I still can’t trust any of the things to work on any modestly complex codebase without close supervision, and that is what I understood the broad argument to be about. It's completely irrelevant to me if they slay the benchmarks or make killer one-shot N-body demos, and it's marginally relevant that they have better context windows or now hallucinate 10% less often (in that they're more useful as tools, which I don't dispute at all), but if you want to claim that they're suddenly super-capable robot engineers that I can throw at any "substantial" problem, you have to bring evidence, because that's a claim that defies my day-to-day experience. They're just constantly so full of shit, and that hasn't changed, at all.

FWIW, this line of argument usually turns into a mott and bailey fallacy, where someone makes an outrageous claim (e.g. "models have recently gained the ability to operate independently as a senior engineer!"), and when challenged on the hyperbole, retreats to a more reasonable position ("Claude 4.5 is clearly better than GPT 3!"), but with the speculative caveat that "we don't know where things will be in N years". I'm not interested in that kind of speculation.

simonw · 2025-12-30T19:21:09 1767122469

Have you spent much time with Codex 5.1 or 5.2 in OpenAI Codex or a Claude Opus 4".5 in Claude code over the last ~6 weeks?

I think they represent a meaningful step change in what models can build. For me they are the moment we went from building relatively trivial things unassisted to building quite large and complex system that take multiple hours, often still triggered by a single prompt.

Some personal examples from the past few weeks.

- A spec-compliant HTML5 parsing library by Codex 5.2: https://simonwillison.net/2025/Dec/15/porting-justhtml/

- A CLI-based transcript export and publishing tool by Opus 4.5: https://simonwillison.net/2025/Dec/25/claude-code-transcript...

- A full JavaScript interpreter in dependency/free Python (!) https://github.com/simonw/micro-javascript - and here's that transcript published using the above-mentioned tool: https://static.simonwillison.net/static/2025/claude-code-mic...

- A WebAssembly runtime in Python which I haven't yet published

The above projects all took multiple prompts, but were still mostly built by prompting Claude Code for web on my iPhone in between Christmas family things.

I have a single-prompt one:

- A Datasette plugin that integrates Cloudflare's CAPTCHA system: https://github.com/simonw/datasette-turnstile - transcript: https://gistpreview.github.io/?2d9190335938762f170b0c0eb6060...

I'm not confident any of these projects would have worked with the coding agents and models we had had four months ago. There is no chance they would've worked with the January 2025 available models.

lordmauve · 2025-12-31T07:50:55 1767167455

Are you using Stop hooks to keep Claude running on a task until it completes, or is it doing that by itself?

simonw · 2025-12-31T07:59:19 1767167959

I'm not using those yet.

I mainly eat it clear tasks like "keep going until all these tests pass", but I do keep an eye on it and occasionally tell it to keep going.

timr · 2025-12-31T03:09:24 1767150564

I’ve used Sonnet 4.5 and Codex 5 and 5.1, but not in their native environment [1].

Setting aside the fact that your examples are mostly “replicate this existing thing in language X” [2], again, I’m not saying that the models haven’t gotten better at crapping out code, or that they’re not useful tools. I use them every day. They're great tools, when someone actually intelligent is using them. I also freely concede that they're better tools than a year ago.

The devil is (as always) in the details: how many prompts did it take? what exactly did you have to prompt for? how closely did you look at the code? how closely did you test the end result? Remember that I can, with some amount of prompting, generate perfectly acceptable code for a complex, real-world app, using only GPT 4. But even the newest models generate absolute bullshit on a fairly regular basis. So telling me that you did something complex with an unspecified amount of additional prompting is fine, but not particularly responsive to the original claim.

[1] Copilot, with a liberal sprinkling of ChatGPT in the web UI. Please don’t engage in “you’re holding it wrong” or "you didn't use the right model" with me - I use enough frontier models on a regular basis to have a good sense of their common failings and happy paths. Also, I am trying to do something other than experiment with models, so if I have to switch environments every day, I’m not doing it. If I have to pay for multiple $200 memberships, I’m not doing it. If they require an exact setup to make them “work”, I am unlikely to do it. Finally, if your entire argument here hinges on a point release of a specific model in the last six weeks…yeah. Not gonna take that seriously, because it's the same exact argument, every six weeks. </caveats>

[2] Nothing really wrong with this -- most programming is an iterative exercise of replicating pre-existing things with minor tweaks -- but we're pretty far into the bailey now, I think. The original argument was that you can one-shot a complex application. Now we're in "I can replicate a large pre-existing thing with repeated hand-holding". Fine, and completely within my own envelope for model performance, but not really the original claim.

simonw · 2025-12-31T08:09:35 1767168575

I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

Copilot style autocomplete or chatting with a model directly is an entirely different experience from letting the model spend half an hour writing code, running that code and iterating on the result uninterrupted.

Here's an example where I sent a prompt at 2:38pm and it churned away for 7 minutes (executing 17 bash commands), then I gave it another prompt and it churned for half an hour and shipped 7 commits with 160 passing tests: https://static.simonwillison.net/static/2025/claude-code-mic...

I completed most of that project on my phone.

timr · 2025-12-31T08:46:28 1767170788

> I know you said don't engage in "you're holding it wrong"... but have you tried these models running in a coding agent tool loop with automatic approvals turned on?

edit: I wrote a different response here, then I realized we might be talking about different things.

Are you asking if I let the agents use tools without my prior approval? I do that for a certain subset of tools (e.g. run tests, do requests, run queries, certain shell commands, even use the browser if possible), but I do not let the agents do branch merges, deploys, etc. I find that the best models are just barely good enough to produce a bad first draft of a multi-file feature (e.g. adding an entirely new controller+view to a web app), and I would never ever consider YOLOing their output to production unless I didn't care at all. I try to get to tests passing clean before even looking at the code.

Also, I am happy to let Copilot burn tokens in this manner and will regularly do it for refactors or initial drafts of new features, I'm honestly not sure if the juice is worth the squeeze -- I still typically have to spend substantial time reworking whatever they create, and the revision time required scales with the amount of time they spend spinning. If I had to pay per token, I'd be much more circumspect about this approach.

simonw · 2025-12-31T14:08:21 1767190101

Yes, that's what I meant. I wasn't sure if you meant classic tab-based autocomplete or Copilot tool-based agent Copilot.

Letting it burn tokens on running tests and refactors (but not letting it merge branches or deploy) is the thing that feels like a huge leap forward to me. We are talking about the same set of capabilities.

timr · 2026-01-01T05:38:04 1767245884

Ah, definitely agent-based copilot. I don't even have the autocomplete stuff turned on anymore, because I found it annoying.

nl · 2025-12-31T07:19:38 1767165578

What do you class a "substantial program"?

For me it is something I can describe in a single casual prompt.

For example I wrote a fully working version of https://tools.nicklothian.com/llm_comparator.html in a single prompt. I refined it and added features with more prompts, but it worked from the start.

timr · 2025-12-31T08:51:17 1767171077

Good question. No strict line, and it's always going to be subjective and a little bit silly to categorize, but when I'm debating this argument I'm thinking: a product that does not exist today (obviously many parts of even a novel product will be completely derivative, and that's fine), with multiple views, controllers, and models, and a non-trivial amount of domain-specific business logic. Likely 50k+ lines of code, but obviously that's very hand-wavy and not how I'd differentiate.

Think: SaaS application that solves some domain specific problem in corporate accounting, versus "in-browser speadsheet", or "first-person shooter video game with AI, multi-player support, editable levels, networking and high-resolution 3D graphics" vs "flappy bird clone".

When you're working on a product of this size, you're probably solving problems like the ones cited by simonw multiple times a week, if not daily.

nl · 2025-12-31T11:54:32 1767182072

I don't think anyone is claiming they can one-shot a 50k line SAAS app.

I think you'd get close on something like Lovable but that's not really one shot either.

nl · 2025-12-31T15:17:24 1767194244

But re-reading your statement you seem to be claiming that there are no 50k SAAS apps that are build even using multi-shot techniques (ie, building a feature at a time).

In that case my Vibe-Prolog project would count: https://github.com/nlothian/Vibe-Prolog/

  - It's 45K of python code
  - It isn't a duplicate of another program (indeed, the reason it isn't finished is because it is stuck between ISO Prolog and SWI Prolog and I need to think about how to resolve this, but I don't know enough Prolog!)
  - Not a *single* line of code is hand written.

Ironically this doesn't really prove that the current frontier models are better because large amounts of code were written with non-frontier models (You can sort of get an idea of what models were used with the labels on https://github.com/nlothian/Vibe-Prolog/pulls?q=is%3Apr+is%3...)

But - importantly - this project is what convinced me that the frontier models are much better than the previous generation. There were numerous times I tried the same thing in a non-Frontier model which couldn't do it, and then I'd try it in Claude, Codex or Gemini and it would succeed.

pianopatrick · 2025-12-30T17:22:46 1767115366

Is there an endpoint for AI improvement? If we can go from functions to classes to substantial programs then it seems like just a few more steps to rewriting whole software products and putting a lot of existing companies out of business.

"AI, I don't like paying for my SAP license, make me a clone with just the features I need".

godelski · 2025-12-31T04:48:34 1767156514

Two things seem to be in contention:

  - Models keep getting better[0]
  - Models since GPT 3 are able to replace junior developers

It's true that both of these can be true at the same time but they are still in contention. We're not seeing agents ready to replace mid level engineersand quite frankly I've yet to see a model actually ready to replace juniors. Possibly low end interns but the major utility of interns is to trial run employment. Frankly it still seems like interns and juniors are advancing faster than these models in the type of skills that matter for companies (not to mention that institutional knowledge is quite valuable). But there's interns that started when GPT 3.5 came out that are seniors now.

The problem is we've been promised that these employees would be replaced[1] any day now, yet that's not happening.

People forget, it is harder to advance when you're already skilled. It's not hard to go from non-programmer to a junior level. Hard to go from junior to senior. And even harder to advance to staff. The difficulty level only increases. This is true for most skills and this is where there's a lot of naivity. We can be advancing faster while the actual capabilities begin to crawl forward rather than leap.

[0] Implication is not just at coding test style questions but also in more general coding development.

[1] Which has another problem in the pipeline. If you don't have junior devs and are unable to replace both mid and seniors by the time that a junior would advance to a senior then you have built a bubble. There's a lot of big bets being made that this will happen yet the evidence is not pointing that way.

pertymcpert · 2025-12-30T04:25:39 1767068739

Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

Why do you use the word "chasing" to describe this? I don't understand. Maybe you should try it and compare it to earlier models to see what people mean.

godelski · 2025-12-30T05:41:56 1767073316

  > Why do you use the word "chasing" to describe this?

I think you'll get the answer to this if you read my comment and your response to understand why you didn't address mine.

Btw, I have tried it. It's annoying that people think the problem is not trying. It was getting old when GPT 3.5 came out. Let's update the argument...

v64 · 2025-12-30T02:30:34 1767061834

Looking forward to hearing about how you're using Opus 4.5, from my experience and what I've heard from others, it's been able to overcome many obstacles that previous iterations stumbled on

remich · 2025-12-30T02:54:52 1767063292

Please do. I'm trying to help other devs in my company get more out of agentic coding, and I've noticed that not everyone is defaulting to Opus 4.5 or even Codex 5.2, and I'm not always able to give good examples to them for why they should. It would be great to have a blog post to point to…

indigodaddy · 2025-12-30T02:55:20 1767063320

Can you expound on Opus 4.5 a little? Is it so good that it's basically a superpower now? How does it differ from your previous LLM usage?

pertymcpert · 2025-12-30T04:26:02 1767068762

To repeat my other comment:

> Opus 4.5 is categorically a much better model from benchmarks and personal experience than Opus 4.1 & Sonnet models. The reason you're seeing a lot of people wax about O4.5 is that it was a real step change in reliable performance. It crossed for me a critical threshold in being able to solve problems by approaching things in systematic ways.

dimitri-vs · 2025-12-30T02:31:32 1767061892

Reality is we went from LLMs as chatbots editing a couple files per request with decent results. To running multiple coding agents in parallel to implement major features based on a spec document and some clarifying questions - in a year.

Even IF llms don't get any better there is a mountain of lemons left to squeeze in their current state.