Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
OpenAI: Increased errors across API and ChatGPT (status.openai.com)
71 points by zeptonix on Nov 28, 2023 | hide | past | favorite | 61 comments


I think we need a new type of status page or at least a public version number on llms, yesterday for me GPT4 started giving nonsense super generic answers, like it was hardly reading what I wrote, and today it is back to top notch performance. I think they were trying to make the model more efficient or something but I just saw a massive decrease in the quality of output. From my side though, there is no version number except for "4"...


On a recent interview of Sam Altman (Hard Fork podcast) he mentioned that due to the load they have been trying to make optimizations, disable certain features, etc. so it’s not outside the realm of possibility that some tweak caused this.


I think one of the harder things about developing these models is that regressions are hard to figure out or even detect.


That’s a good point, would be curious to understand more what the testing setup is like for these kinds of systems.


I experience the same with the API, it simply ignored all system message. This things should be told beforehand clearly


This conspiracy always comes up - don't you think that they test the output of the model revisions on probably 1000s of downstream tasks at this point? Bad responses are hard to reason about, could be prompting, could be a model revision, could just be bad luck.


Or maybe they are just AB testing and aggressively optimizing the response generation?

LLMs are known to be compute/energy hungry to execute. It is a developing technology, if not downright experimental.

Therefore, this explanation is very likely. I cannot see the reason to call this a conspiracy.


AB testing on what? AB tests need to produce some results which are then compared. How would releasing different versions in production help with that?

It would make more sense if that was internal and the responses were then graded.

A failed canary release would be more likely, where they released this version to a small amount of people not realising it was bad


On top of my mind: responses have feedback buttons below them.

You can simply deploy different versions and compare the neutral + positive / negative feeback ratio.

It would be sinful if they did not add other metrics like how many times the user had to correct and update their prompt before ending the chat, etc.

Data, data, data...


There are the up down thumbs and automatic sentiment analysis as a test.


Calling that a conspiracy is like saying its a conspiracy theory that Meta shows different people different Ads. I'd be more concerned if OpenAI WASN'T constantly trying to tune their models. Its literaly their job to tune the models.


> OK I have a table in postgresql and I am adding a trigger such that when an insert happens on that table, an insert happens on another table. The second table has a constraint. What happens to the first insert if the second insert violates the constraint?

How can I get help with this now?

Google result 1: https://stackoverflow.com/questions/77148711/create-a-trigge...

Google result 2: https://dba.stackexchange.com/questions/307448/postgresql-tr...

Like 90% of my questions like this are going to ChatGPT these days.

I can figure it out via the docs, but ChatGPT is SO convenient for things like this.


Well, were it possible, I'd say go back in time and study your tools so that you're not spending the journeyman period of your career ricocheting between tutorials and faqs.

Failing that, read the documentation. Failing that, stand up a quick experiment.

Somehow, we survived before ChatGPT and even before saturated question boards. Those strategies are still available to you and well worth learning


The "good old days weren't always good". I'm tired of either limiting myself to the information I have on the top of my head, the LLMs are really helping allow me to be creative and stretch out to do things that are just beyond my bread and butter, or things that I do infrequently.


Exactly this. I -could- become an expert in the intricacies of every tool I touch, or I could use chat gpt and move on to solving the next problem.


LLMs are the great equalizer of our time.


I see your point but the world changes so fast. Back in my day you just needed to learn C, understand algorithms and so on and then you could get deeper in an area or two. Today, you need to understand and be able to proficiently use so many technologies that you can feel lost.

And this is what happens when, say, you loose a job you've been doing for 10-15 years. You need to re-learn the world. And a lifetime is not enough to do it the way we used to do it.


Yeah, not all of us have memories that work like that. I’ve studied my tools but often forget the little details. My productivity has increased since GPT has come out.


Stuff changes too. There are things that are worth learning and being fluent in. Regex, sql. But even then there are always edge cases or weirdness that someone has solved before. LLMs are just much better for this than wading through forum posts.


We also survived before the internet and indoor plumbing and fire, and yet life is so much better now.


I'd go straight to the experiment, create the tables on a local postgresdb and try to get it to work.


Agreed that chatgpt is great for this kind of thing - a coworker is working on this GPT specifically for postgres https://chat.openai.com/g/g-uXYoYQEFi-sql-sage

But with it being down, my biggest advice would be to try it and see. something like dbfiddle.uk is perfect for these kinds of tests.


So ChatGPT says -- to me, a minute ago, ymmv -- it will rollback the first insert. Now what? Do you believe it? Cool. I wouldn't. I would confirm its claim, either by Googling or by trying it myself.

Also, when I asked it "what if I use PostgreSQL's non-transactional triggers", which I thought I just made up, it told me it wouldn't roll back the first insert: Non-transactional triggers are executed as part of the statement that triggered them, but they don't participate in the transaction control. So now I don't know what to think.


> What happens to the first insert if the second insert violates the constraint?

Try it and see? Why do you need an AI to help with this?


Why do you use an internet search engine when you can walk to the library?


The question at hand is pretty easy to test manually and the information you get is much more useful. You will get to see the exact behavior for yourself, can easily build on the test case as related questions come up, and you know the information you are getting is correct rather than a hallucination.

Copying information from ChatGPT is the newer version of blindly copying answers from StackOverflow. It often works out ok and at times makes sense to do, but it can easily lead to software flaws and doesn't do much to build a better undersanding of the domain which is necessary to solve more difficult challenges that don't fit into a Q&A format well.


In my experience, I encounter more issues and waste more time when I fiddle on my own and try stuff compared to doing the same, but using chatGPT.

There is a lot of knowledge that I don’t want to have expertise with. Sure, I could carefully read the PostgreSQL documentation about triggers and implement it myself, or I could get the job done in a few minutes and procrastinate on HN instead.


> The question at hand is pretty easy to test manually and the information you get is much more useful.

This approach can be hazardous to the health of the product you're building. For example, if you take this approach to answer the question of "what happens if I have two connections to a MySQL database, start a transaction in one of them and insert a row (but don't commit) and then issue a SELECT which would show the inserted row", then you will see consistent results across all of the experiments you run with that particular database, but you could easily end up with bugs that only show up when the transaction isolation level changes from how you tested it.

Whereas if you search for or ask that question, the answers you get will likely mention that transaction isolation levels are a thing.

You might also be able to get this level of knowledge by reading the manual, though there will still be things that are not included in the manual but do come up regularly in discussions on the wider internet.


> you could easily end up with bugs that only show up when the transaction isolation level changes from how you tested it.

In fact it's very likely you would. You have to understand the transaction semantics and test with all the isolation levels and database platforms you intend to support. If you don't know this, you need to learn more about relational databases before building a product on top of them.


> If you don't know this, you need to learn more about relational databases before building a product on top of them.

And now extend this principle to everything in the stack.


You should at least have some basic ideas about your stack.


Agreed. But also if you suddenly find that the precise behavior of one of the parts of your stack matters, you would be well advised to search the internet about how exactly that bit works in practice and whether there are any nonobvious footguns in addition to your empirical testing and the stuff that the manual claims is true.


You can also, uh, just try it with a trivial test and see what happens.


This happens so often it's made it really easy to test an app I'm developing for API outages and put it into "maintenance mode" accordingly. I don't even need to mock the outage... just wait for the weekly occurrence


As an FYI, Bing Chat at http://bing.com/chat continues to work even during OpenAI outages. It's also running GPT-4 -- it can be annoying when it reaches out to search, but you can usually explicitly prompt it to not do that.


Yes it’s indeed an option. Note that Creative mode uses GPT-4, not the default Balanced.


What does Balanced use then? 3.5?


Same for the past 30+ minutes was surprised not to see it on HN but guess it just took a little time for someone to post. Tried Bard and reminded me how far behind it is when asking programmer questions.


You might find the results from Google Cloud's Vertex AI better than the general purpose Bard. They have a number of pre-trained models for coding tasks. You can chat in the console UI or use the API directly. They also offer a number of open source models (codey & llama), so you can easily try different models

https://cloud.google.com/vertex-ai

https://console.cloud.google.com/vertex-ai/model-garden


I am finding Bard almost comparable. Gpt quality is declining I think.

Wouldn't surprise me if bard permanently surpasses gpt in the next quarter. Particularly if openai is dialing down quality...


Does anyone know if Azure's OpenAI Studio is down as well? For everyone using ChatGPT APIs in production, this needs to be the most straightforward failover mechanism.

Using this to plug our open-source tool https://github.com/Marvin-Labs/lbgpt which allows ChatGPT consumers to quickly load balance and failover between OpenAi and Azure models.


Azure OpenAI is running normally https://azure.status.microsoft/en-us/status


Has anyone managed to replicate the “search the web” functionality through the API? I’ve set up two “functions” one to get search results and one to extract the text from a search results and feed it back to the AI but I am a bit stuck.

What do you use to extract the text from a webpage and how do you handle websites with anti-bot measures?


Shouldn’t the API response be the exact response you would get if you sent the same input to ChatGPT, assuming it’a the same model ?


No, ChatGPT is more than just a UI for the OpenAI API. Web requests are a feature built into ChatGPT using the API's support for function calls, but the API doesn't make any external web requests by itself.


The API doesn't have access to the web search functionality unless something changed.


We should except OpenAI's API to go down regularly at least every week [0] just like GitHub does all the time. But have you tried contacting the CEO of OpenAI this time?

[0] https://news.ycombinator.com/item?id=38371339


You may want to try https://lemonfox.ai/ as a OpenAI API alternative. I think relying on open-source models is a great alternative.


The solution to 'GPT-4 sometimes breaks' isn't to use something that never works...


Have you never used the open source models? They are getting really good - better than 3.5 for sure not as good as 4 except when domain trained in my opinion


Finetuned local models can work just as well or better than gpt-4 in many use cases


I'm running this comparison of free and open AI engines:

https://www.gnod.com/search/ai

Looks like they all currently work.

If there are more, let me know.


I’m seeing everyone reporting “laziness” today on X. Like it’s telling people to do their own coding. What’s up with that?


The AI has joined the teamsters


Why don't they disable the free version when they're hitting this type of load


Not working for me. Am I going to have to read docs or search Google like some boomer?


If you are not ok with casually saying racist or sexist things, you probably also shouldn’t say ageist things either.


I think its okay to make little generational jokes. Its not like they said .. "Do I have to google this like some old f*ck?" ... Certain generations are slower to pick up technology.


With Google'n'co you at least know when search is wrong.


Don't people often fall into the "vaccines cause autism" trap from Google?


(speaking as a just-slightly-pre-Boomer) Yes. But probably not for long.

However, you might want to get used to it, as it looks like it might happen not uncommonly.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: