I think your questions all grew up in a world where the people operating the thi...

FridgeSeal · on Feb 22, 2024

I’d genuinely expect the people who built and operate the thing to have a far better write up than what amounted to “it no worked lol”. Sure, NN’s are opaque, but this org paints itself as the herald and shepard of ai and they just produced a write up that’s hardly worthy of a primary-school-child’s account of their recent holiday.

ben_w · on Feb 22, 2024

"More technically, inference kernels produced incorrect results when used in certain GPU configurations."

As someone who learned to read in part with the Commodore 64 user manual telling me about PEEK and POKE while I was actually in primary school, I think you're greatly overstating what primary school children write about in their holidays.

Snark aside, is their message vague? Sure. But more words wouldn't actually tell us more unless they also published a lot more about their setup — "we were casting float16 to uint16 on line 249 of server_config.c" isn't going to really help either.

Also, here's a recent security update from Apple to compare against: https://support.apple.com/en-gb/HT214056

FridgeSeal · on Feb 22, 2024

Those at least link back to a CVE, which often does have all the gory technical details.

I think your counter-example swings too far in the other direction. Nobody expects a git-diff of the fix, but a solid explanation of the whys and wherefore’s isn’t unreasonable. Cloudflare does, fly.io does, etc etc.

thethimble · on Feb 22, 2024

Random sampling issues due to problems with the inference kernels on certain GPU configurations. This seems like a clear root cause and has nothing to do with the magic of NNs. I don’t understand what the fuss is about.

throwuwu · on Feb 22, 2024

They just want to reinforce their own bias that OpenAI BAD and DUMB, rationalism GOOD! When it’s their own fault for not understanding enough theory to know what happens if there were to be a loss of precision in the predicted embedding vector that maps to a token. If enough decimal places are lopped off or nudged then that moves the predicted vector slightly away from where it should be and you get a nearby token instead. Instant aphasia. The report said it was a GPU configuration problem so my guess is the wrong precision was used in some number of GPUs but I have no idea how they configure their cluster so take that with a giant grain of salt.

meowface · on Feb 22, 2024

Agreed, but I think they raise a fair point that some kind of automated testing probably should've existed to look for sudden major changes in prompt -> output production between versions, with lots of prompt test cases. Maybe this testing did exist but this problem just didn't surface, for some reason?

FridgeSeal · on Feb 22, 2024

> so my guess is the wrong precision was used in some number of GPUs

Well we wouldn’t have to guess if their PM wasn’t so pointlessly vague would we?

cqqxo4zV46cp · on Feb 22, 2024

Yep. Some obviously made their mind up before the page even rendered.

FridgeSeal · on Feb 22, 2024

If that’s what it was, they’ve done a fabulously bad job of conveying that, and then made no attempt to dig into why _that_ happened. Which, again, is like going “well it broke because it broke”, not even so much as a “this can happen because some bit of hardware had a cosmic bit flip and freaked out, generally happens with a probability of 1:x”.

lynx23 · on Feb 22, 2024

Well, the YouTube app ChangeLog on iOS uses the same template since years: "fixed space-time continuum". This is a trend, to pretend that users are too dumb to understand the complexity of things, so just do some handwaving and thats it.

jstarfish · on Feb 23, 2024

Those splines don't reticulate themselves, you know.

drpossum · on Feb 22, 2024

You have high expectations of the people who you would typically run across in this space.

sam0x17 · on Feb 22, 2024

case in point: one of the root system prompts leaked out recently and it's pretty clear the "laziness" on a number of fronts is directly because of the root system prompt.

fzzzy · on Feb 22, 2024

I know about the leaked prompts and the laziness issues, but haven't read the prompt. What specifically about the prompt changes do you feel have led to laziness?

EchoChamberMan · on Feb 22, 2024

The way companies choose to portray themselves is a rather unreliable way of predicting their actions.

stravant · on Feb 22, 2024

There are surely reasonable ways to smoke test changes to the extent that they would catch the issue that came up here.

E.g.: Have a gauntlet of 20 moderate complexity questions with machine checkable characteristics in the answer. A couple may fail incidentally now and then but if more than N/20 fail you know something's probably gone wrong.

mike_hearn · on Feb 22, 2024

Reading between the lines a bit here, it would probably require more specialized testing infrastructure than normal.

I used to be an SRE at Google and I wrote up internal postmortems there. To me, this explanation feels a lot like they are trying to avoid naming any of their technical partners, but the most likely explanation for what happened is that Microsoft installed some new GPU racks without necessarily informing OpenAI or possibly only informing part of their ops team, and that this new hardware differed in some subtle way from the existing hardware. Quite possibly that means a driver bug, or some sort of hardware incompatibility that required a workaround. Certainly, they would not want to be seen publicly attacking Nvidia or Microsoft given the importance of these two partners, so keeping it high level would certainly be for the best. Virtually. None of openai's customers would be able to use any further technical detail anyway, and they may still be working out a testing strategy that would allow them to detect changes in the hardware mix that unexpectedly cause regressions without necessarily any software deployments being involved.

rcbdev · on Feb 22, 2024

This is the most grounded take and what I think probably happened as well.

For companies this size, with these valuations, everything the public is meant to see is heavily curated to accommodate all kinds of non-technical interests.

whyever · on Feb 22, 2024

In this case, it was a problem with tokenization, which is deterministic.

mock-possum · on Feb 22, 2024

“We don’t understand why neural networks work” is a myth. It’s not a miracle, it’s just code, and you can step through it to debug it the same way you would any other program.

zdimension · on Feb 22, 2024

You can step-debug a Python program using Windbg but it won't tell you a lot about what's happening since every line is hundreds of CPython API calls.

Sure, you can "step through" a neural network but all you'll ever see are arrays of meaningless floats moving around.

pests · on Feb 22, 2024

We know more about NNs then you think. You are vastly underestimating the field and what knowledge we have.