I think we need a new type of status page or at least a public version number on llms, yesterday for me GPT4 started giving nonsense super generic answers, like it was hardly reading what I wrote, and today it is back to top notch performance. I think they were trying to make the model more efficient or something but I just saw a massive decrease in the quality of output. From my side though, there is no version number except for "4"...
On a recent interview of Sam Altman (Hard Fork podcast) he mentioned that due to the load they have been trying to make optimizations, disable certain features, etc. so it’s not outside the realm of possibility that some tweak caused this.
This conspiracy always comes up - don't you think that they test the output of the model revisions on probably 1000s of downstream tasks at this point? Bad responses are hard to reason about, could be prompting, could be a model revision, could just be bad luck.
AB testing on what? AB tests need to produce some results which are then compared. How would releasing different versions in production help with that?
It would make more sense if that was internal and the responses were then graded.
A failed canary release would be more likely, where they released this version to a small amount of people not realising it was bad
Calling that a conspiracy is like saying its a conspiracy theory that Meta shows different people different Ads. I'd be more concerned if OpenAI WASN'T constantly trying to tune their models. Its literaly their job to tune the models.
> OK I have a table in postgresql and I am adding a trigger such that when an insert happens on that table, an insert happens on another table. The second table has a constraint. What happens to the first insert if the second insert violates the constraint?
Well, were it possible, I'd say go back in time and study your tools so that you're not spending the journeyman period of your career ricocheting between tutorials and faqs.
Failing that, read the documentation. Failing that, stand up a quick experiment.
Somehow, we survived before ChatGPT and even before saturated question boards. Those strategies are still available to you and well worth learning
The "good old days weren't always good". I'm tired of either limiting myself to the information I have on the top of my head, the LLMs are really helping allow me to be creative and stretch out to do things that are just beyond my bread and butter, or things that I do infrequently.
I see your point but the world changes so fast. Back in my day you just needed to learn C, understand algorithms and so on and then you could get deeper in an area or two. Today, you need to understand and be able to proficiently use so many technologies that you can feel lost.
And this is what happens when, say, you loose a job you've been doing for 10-15 years. You need to re-learn the world. And a lifetime is not enough to do it the way we used to do it.
Yeah, not all of us have memories that work like that. I’ve studied my tools but often forget the little details. My productivity has increased since GPT has come out.
Stuff changes too. There are things that are worth learning and being fluent in. Regex, sql. But even then there are always edge cases or weirdness that someone has solved before. LLMs are just much better for this than wading through forum posts.
So ChatGPT says -- to me, a minute ago, ymmv -- it will rollback the first insert. Now what? Do you believe it? Cool. I wouldn't. I would confirm its claim, either by Googling or by trying it myself.
Also, when I asked it "what if I use PostgreSQL's non-transactional triggers", which I thought I just made up, it told me it wouldn't roll back the first insert: Non-transactional triggers are executed as part of the statement that triggered them, but they don't participate in the transaction control. So now I don't know what to think.
The question at hand is pretty easy to test manually and the information you get is much more useful. You will get to see the exact behavior for yourself, can easily build on the test case as related questions come up, and you know the information you are getting is correct rather than a hallucination.
Copying information from ChatGPT is the newer version of blindly copying answers from StackOverflow. It often works out ok and at times makes sense to do, but it can easily lead to software flaws and doesn't do much to build a better undersanding of the domain which is necessary to solve more difficult challenges that don't fit into a Q&A format well.
In my experience, I encounter more issues and waste more time when I fiddle on my own and try stuff compared to doing the same, but using chatGPT.
There is a lot of knowledge that I don’t want to have expertise with. Sure, I could carefully read the PostgreSQL documentation about triggers and implement it myself, or I could get the job done in a few minutes and procrastinate on HN instead.
> The question at hand is pretty easy to test manually and the information you get is much more useful.
This approach can be hazardous to the health of the product you're building. For example, if you take this approach to answer the question of "what happens if I have two connections to a MySQL database, start a transaction in one of them and insert a row (but don't commit) and then issue a SELECT which would show the inserted row", then you will see consistent results across all of the experiments you run with that particular database, but you could easily end up with bugs that only show up when the transaction isolation level changes from how you tested it.
Whereas if you search for or ask that question, the answers you get will likely mention that transaction isolation levels are a thing.
You might also be able to get this level of knowledge by reading the manual, though there will still be things that are not included in the manual but do come up regularly in discussions on the wider internet.
> you could easily end up with bugs that only show up when the transaction isolation level changes from how you tested it.
In fact it's very likely you would. You have to understand the transaction semantics and test with all the isolation levels and database platforms you intend to support. If you don't know this, you need to learn more about relational databases before building a product on top of them.
Agreed. But also if you suddenly find that the precise behavior of one of the parts of your stack matters, you would be well advised to search the internet about how exactly that bit works in practice and whether there are any nonobvious footguns in addition to your empirical testing and the stuff that the manual claims is true.
This happens so often it's made it really easy to test an app I'm developing for API outages and put it into "maintenance mode" accordingly. I don't even need to mock the outage... just wait for the weekly occurrence
As an FYI, Bing Chat at http://bing.com/chat continues to work even during OpenAI outages. It's also running GPT-4 -- it can be annoying when it reaches out to search, but you can usually explicitly prompt it to not do that.
Same for the past 30+ minutes was surprised not to see it on HN but guess it just took a little time for someone to post. Tried Bard and reminded me how far behind it is when asking programmer questions.
You might find the results from Google Cloud's Vertex AI better than the general purpose Bard. They have a number of pre-trained models for coding tasks. You can chat in the console UI or use the API directly. They also offer a number of open source models (codey & llama), so you can easily try different models
Does anyone know if Azure's OpenAI Studio is down as well? For everyone using ChatGPT APIs in production, this needs to be the most straightforward failover mechanism.
Using this to plug our open-source tool https://github.com/Marvin-Labs/lbgpt which allows ChatGPT consumers to quickly load balance and failover between OpenAi and Azure models.
Has anyone managed to replicate the “search the web” functionality through the API? I’ve set up two “functions” one to get search results and one to extract the text from a search results and feed it back to the AI but I am a bit stuck.
What do you use to extract the text from a webpage and how do you handle websites with anti-bot measures?
No, ChatGPT is more than just a UI for the OpenAI API. Web requests are a feature built into ChatGPT using the API's support for function calls, but the API doesn't make any external web requests by itself.
We should except OpenAI's API to go down regularly at least every week [0] just like GitHub does all the time. But have you tried contacting the CEO of OpenAI this time?
Have you never used the open source models? They are getting really good - better than 3.5 for sure not as good as 4 except when domain trained in my opinion
I think its okay to make little generational jokes. Its not like they said .. "Do I have to google this like some old f*ck?" ... Certain generations are slower to pick up technology.