Hacker News new | past | comments | ask | show | jobs | submit login

I am used to postmortems posted to here being a rare chance for us to take a peek behind the curtain and get a glimpse into things like architecture, monitoring systems, disaster recovery processes, "blameless culture", etc for large software service companies.

In contrast, I feel like like the greatest insight that could be gleaned from this post is that OpenAI uses GPU's.




We also know it uses the GPUs to generate numbers. But these numbers, they were the wrong ones. More technically, part of the computation didn’t work when run on some hardware.


Yeah, definitely opaque. If I had to guess it sort of sounds like a code optimization that resulted in a numerical error, but only in some GPUs or CUDA versions. I've seen that sort of issue happen a few times in the pytorch framework, for example.


Yeah, definitely opaque.

I wonder what the AI would say if someone asked it what happened.

It would be pretty funny if it gave a detailed answer.


It will make something up if it answers at all. It doesn’t know.


Someone upthread did just that. It needs some love.

https://news.ycombinator.com/item?id=39462686


It sounds like something went sideways with the embedding mapping. Either some kind of quantization, different rounding, or maybe just an older embedding.


The point isn't the specifics; the point is that this isn't a postmortem.

A postmortem should be detailed enough for someone to understand the background, how the problem came to be, then what happened, and the walk-through what has been done such that it won't happen again. It takes … well at least a page. This is far too short to quality.

This is more "ugh, here's a rough explanation, please go away now" territory.

OpenAI isn't the first company to abuse the term this way, though. But it devalues the real PMs out there.


Sorry, not disagreeing, just offering speculation in lieu of the answers we don't have.


That’s not helping, that’s excusing OpenAIs behavior, which is not something anyone on hn should be doing.

This is supposedly the greatest AI mankind has ever created, it goes down for a little while and we have zero information on why or how, that’s simply inexcusable

If this is such a socially impacting technical change we should be ripping it to pieces to understand exactly how it works. Thats a) how we protect society from technical charlatans b) how you spawn a whole new world of magnificent innovations (see Linus building a truly free Unix like operating system for everyone to use).

Failing to hold them to as high a bar is a another step down the path to a dystopian corporatists future…


> it goes down for a little while and we have zero information on why or how

We have more than zero information. They applied a change and it didn’t work on some set of their hardware so they reverted it. That is not much information but also not zero.

> that’s simply inexcusable

If your contractual SLAs were violated take it up with the billing department.

> If this is such a socially impacting technical change we should be ripping it to pieces to understand exactly how it works.

And people are doing that. Not by complaining when the corp are not sufficiently forthcomming but by implementing their own systems. That is how you have any chance of avoiding the dystopian corporatist future you mention.


Would not be the first time "Open"AI abused words.


In my limited experience this screams “applied a generated mask to the wrong data”. Like they scored tokens then applied the results to the wrong source or something. Obviously more an idle guess from first principles than the direct cause, tho


or a manic episode of a caged genius?


I wonder if we'll accidentally gain insight into schizophrenia or other human-neurological disorders from AI crashes/failures.


Someone posted an explanation that lines up with their postmortem: https://news.ycombinator.com/item?id=39450978


How does that line up? OpenAI said they had a bug in certain GPU configurations that caused the token numbers to be wrong which made normal output look like garbage. This post is guessing they set the frequency and presence penalties too high.


ChatGPT had a stroke. Haven't seen that since the 3B parameter models from 8 months ago


They could have said "shit happened" and it would have been as informative tbh.


EDIT: misread


Pretty sure that was his point.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: