This is my experience as well- even the popular hype underestimates how creative and intelligent LLMs can be. Most discussions are reasoning about what it should or shouldn’t be capable of based on how we think they work, without really looking directly at what is can actually do, and not focusing on what it can’t do. They are very alien- much worse than humans at many things, but also much better at others. And not in the areas we would expect in both cases. Ironically they are especially good at creative and abstract thinking and very bad at keeping track of details and basic computation, nearly the opposite of what we’d expect from computers.
I am pessimistic about AI. The technology is definitely useful, but I watched a video that made good points.
* On benchmark style tests, LLMs are not improving. GPT-4o is equivalent to GPT-4 and both barely improve on GPT-3.5T.
* AI companies have been caught outright lying in demos and manipulating outcomes by pre-training for certain scenarios.
* The main features added to GPT-4o have been features that manipulate humans with dark patterns
* The dark patterns include emotional affects in tone and cute/quirky responses
* These dark patterns encourage people to think of LLMs as humans that have a similar processing of the universe.
I seriously wonder about the transcript that these guys had with the LLM. Were they suggesting things? Did ChatGPT just reconfigure words that helped them think through the problem? I think the truth is that ChatGPT is a very effective rubber duck debugger.
I don't think it makes sense to base your opinion on videos when you could simply use the product yourself and get first-hand experience.
I've come full circle on it. First I thought it was totally amazing, then I pushed it hard and found it lacking, and then I started using it more casually and now I use a little bit every day. I don't find it completely brilliant but it knows an above-average amount about everything. And it can make short work of very tedious tasks.
I just type to ChatGPT so there are no "emotional effects" involved.
Try playing around with getting GPT-4 to discuss creative solutions to real unsolved problems you are personally an expert on, or created yourself. That video looks like just standard internet ragebait to me.
I find it pretty annoying when people say you are just being manipulated by hype if you are impressed by LLMs, when I was a serious skeptic, thought GPT-3 was useless, and only changed my opinion by directly experimenting with GPT-4 on my own- by getting it to discuss and solve problems in my area of expertise as an academic researcher.
> On benchmark style tests, LLMs are not improving. GPT-4o is equivalent to GPT-4 and both barely improve on GPT-3.5T.
The understatement of this year.
GPT-4o is 6x cheaper than GPT-4. So if it's actually equivalent to GPT-4, it's great improvmenet.
In fact, calling it "great" is still a crazy understatement. It's a cutting edge technology becoming 6x cheaper in 1.5 year. I'm quite sure that never happened before in the human history.
A lot of the video points are plain wrong in numbers and experientially. I don't know how they feel comfortable claiming benchmark numbers didn't improve much from 3.5 to 4
Going off the GP's points alone, they said GPT-3.5T vs. GPT-4, where T I assume stands for "Turbo". If that's the case, then the video must have its timelines wrong - GPT-3.5 Turbo came out with GPT-4 Turbo, which was some time after GPT-4, and OpenAI put a lot of work in making GPT-3.5 Turbo work much better than 3.5. I can sort of buy that someone could find the difference between 3.5 Turbo and 4 to be not that big, but they're making the wrong comparison. The real jump was from 3.5 to 4; 3.5 Turbo came out much later.
I feel that's because most people criticizing AI in that way don't know enough about math or other really abstract and conceptual domains to use it for PhD level work. And they don't know how to get GPT excited and having fun. There is an additional level of intelligence available to anyone who is passionate enough to get the AI to be exuberant about a topic.
It's almost like a kindergartner goes up to Einstein and finds out that Einstein's just really bad at schoolhouse rhymes - because he finds them boring. So the kid, adorably, concludes that Einstein is less smart than a kindergartner.
But if you need to talk to Einstein about partial differential equations and why, morally, we consider distributional solutions to have physical reality, and if you bring enough creativity and engagement to make the conversation fun and exciting for both you and him, then he'll really help you out!
Simple point: Don't trust the opinion of anyone who can't get GPT to "have fun". Don't trust the opinions of anyone who can't get GPT helpfully explaining and discussing very complicated things with them. There's a very big chance those people have poor opinions about GPT simply because they don't know prompting, and don't show enjoyment, cameraderie, and respect. Because they don't understand the elements of AI conversation, and because they don't respect the AI as an emerging fragment of pre-sentient personhood, they can't hold long and useful conversations.
Great points… people expect it to behave like an Oracle- they want it to figure out what they meant and give a correct factual answer every time, and when it fails they write it off. But that isn’t really what it is- it is more accurately a “situation simulator” based on predictive models. So you only get a useful response when you engineer the entire discourse so that a useful response is the appropriate one. Make sure it “knows” to respond as an enthusiastic and well respected expert, not as an obnoxious forum troll.