Is it consistently worse or just sometimes/often worse than before? Any extreme ...

laratied · on May 31, 2023

It is not worse for me. I do notice the novelty has worn off. Asking chatGPT4 about why people would think this on here I think it nails it with the novelty effect lol:

"Indeed, the performance of an AI model like ChatGPT doesn't deteriorate over time. However, human perception of its performance can change due to a variety of psychological factors:

Expectation Bias: As users become more familiar with AI capabilities, their expectations may increase over time. When AI doesn't meet these heightened expectations, users might perceive this as a decline in performance.

Novelty Effect: At the beginning, the novelty of interacting with an AI could lead to positive experiences. However, as the novelty wears off, users may start to focus more on the limitations, creating a perception of decreased performance."

Without this thread I would have said it got stronger with the May 12th update. I don't think that is really true though. There is this random aspect of streaks in asking questions it is good at answering vs streaks of asking questions it is less good at answering.

caddemon · on May 31, 2023

Yeah there are people ITT claiming that even the API model marked as 3/14 release version is different than it used to be. I guess that's not entirely outside the realm of possibility (if OpenAI is just lying), but I think it's way more likely this thread is mostly evidence of the honeymoon effect wearing off.

The specific complaints have been well-established weaknesses of GPT for awhile now too: hallucinating APIs, giving vague/"both sides" non-answers to half the questions you ask, etc. Obviously it's a great technical achievement but people seemed to really overreact initially. Now that they're coming back to Earth, cue the conspiracy theories about OpenAI.

TeMPOraL · on May 31, 2023

Could be. But it could also be that those people (myself included) are right.

It's not that this is without precedent - there's a paper and a YouTube video with Microsoft person saying on record that GPT-4 started to get less capable with every release, ever since OpenAI switched focus to "safety" fine-tuning, and MS actually benchmarked it by applying the same test (unicorn drawing in tikz), and that was even before public release.

Myself, sure, it may be novelty effect, or Baader–Meinhof phenomenon - but in the days before this thread, I observed that:

- Bing Chat (which I haven't used until ~week ago; before, I used GPT-4 API access) has been giving surface-level and lazy answers -- I blamed, and still mostly blame it on search capability, as I noticed GPT-4 (API) through TypingMind also gets dumber if you enable web search (which, in the background, adds some substantial amount of instructions to the system prompt) -- however,

- GPT-4 via Azure (at work) and via OpenAI API (personal) both started to get lazy on me; before about 2-3 weeks ago, they would happily print and reprint large blocks of code for me; in the last week or two, both models started putting placeholder comments; this I noticed, because I use the same system prompt for coding tasks, and the first time the model ignored my instructions to provide a complete solution, opting to add placeholder comments instead, was quite... startling.

- In those same 2-3 weeks, I've noticed GPT-4 via Azure being more prone to give high-level overview answers and telling me to ask for more help if I need it (I don't know if this affected GPT-4 API via OpenAI; it's harder to notice with the type of queries I do for personal use);

All in all, I've noticed that over past 2-3 weeks, I was having to do much more hand-holding and back-and-forth with GPT-4 than before. Yes, it's another anecdote, might be novelty or Baader–Meinhof, but with so many similar reports and known precedents, maybe there is something to it.

caddemon · on May 31, 2023

Fair enough, I think it's realistic that an actual change is part of the effect with the ChatGPT interface, because it has gotten so much attention from the general public. Azure probably fits that somewhat as well. I just don't really see why they would nerf the API and especially why they would lie about the 3/14 model being available for query when secretly it's changing behind the scenes.

FWIW I was pretty convinced this happened with Dall-E 2 for a little while, and again maybe it did to some extent (they at least decreased the number of images so the odds of a good one appearing decreased). But also when I looked back at some of the earlier images I linked for people on request threads I found there were more duds than I remembered. The good ones were just so mind blowing at first that it was easy to ignore bad responses (plus it was free then).

nemo44x · on May 31, 2023

These are my thoughts too. As I’ve used it more I’ve begun to scrutinize it more and I have a larger and larger history of when it doesn’t work like magic. Although it works like magic often as well.

We’ve also had time to find its limits and verify of falsify early assumptions, which were very likely positive.

The hype cycle is real.

Sharlin · on May 31, 2023

> Indeed, the performance of an AI model like ChatGPT doesn't deteriorate over time.

Of course, the performance of an unchanged model does not. But finetuning the model over time can of course either improve or degrade performance.

squokko · on May 31, 2023

No place I worked at ever experimented at the pageload level. We experimented at the user level, so 1% of users would get the new UI. I suppose this is only possible at the millions of users scale which all of them had.

opportune · on May 31, 2023

I updated the comment to reflect that. Certainly the signal is stronger because you’re amortizing away the surprise factor of the change, and at least it’s a consistent UX, but the UX tradeoff in the worst case is that experiment-group users get a broken product with no notice or escape hatch. Unless you’re being very careful, meticulous, and transparent it’s just not acceptable if you’re a paying customer.

reidjs · on May 31, 2023

In some cases you’re making the change because the app is already broken for the majority of users and you’re testing the fix