I keep seeing this cop-out, which ignores that it's fundamentally the same architecture, and has the same flaws. More wallpaper to hide the cracks better makes it an even worse tool for these use cases because all it does is fool more people into thinking it has capabilities that it fundamentally doesn't.
I don't think this is a fair argument. If we compare a GPT4 architecture with 5,000 parameters and a GPT4 architecture with 1 trillion parameters, should we judge the capabilities of both by the 5,000 parameter version, because they're both the same architecture?
There is more than architecture that can set them apart as well. GPT4 may have been trained by a slightly different algorithm, or on different data, and this can result in fundamentally different results.
Most of these conversations are not focused on one specific version, but are about the capabilities of LLMs in general, and it is implied we are talking about state-of-the-art LLMs, and GPT3 is no longer state-of-the-art.
This is nonsense. It's not a cop-out to say "use the latest, most capable model before complaining". Anyone remotely close to this field knows model size matters, amount of training data matters, quality of training data matters, and several other variables matter. Even if someone knows zero about it, just using 3.5 v 4 is enough to see they are two different things. Like a lizard v a human.
It's still fundamentally the same, hallucinates just the same, and anthropomorphizes itself as a confident, knowledgeable, intelligent being just the same. A newer, better, faster, more capable car still isn't an airplane, even if it go fast enough to spend several seconds in the air.
Sure, and 40 year olds have the same capabilities as 4 year olds, because "same architecture" or "fundamentally the same". And putting random weights inside the GPT-4 model architecture should behave "fundamentally the same" as the trained GPT-4 weights, because it's "same architecture". Forget this "training" stuff.
It's not a person, it's a machine. And it's one that will still produce hallucinations that embarrassingly prove that it has no notion of intelligence, and do so confidently. That it does so less than it's sibling is entirely irrelevant.
To me it's a bit like someone making the claim "humans are flawed, and we should think critically about the things they say", and someone responding with "well which human are you talking about? Because Einstein is orders of magnitude above the Walmart checkout guy".
Processing fees aren't inherently expensively. They're low in Europe, and high in the US only because they go to rewards programs. You're paying 2% more but getting 2% back. Legislation could easily regulate them to eliminate rewards programs and bring them back down low.
Chargebacks are supposed to be expensive because they're a deterrent to businesses acting in ways that will lead customers to attempt chargebacks. And then they usually have an element of manual review which costs $ as well. Chargebacks being expensive is a feature not a bug. And they're an easy consumer protection option that isn't even a possibility with cash.
Chargebacks exist because the element of agency in payment is broken in the electronic money systems we've created. When using a card, you don't give merchants your money, you provide them with enough information to take your money away from you.
The UX of those two things is similar enough that nobody really cares, until it bites them in the ass, that is.
> Consequently, these laws of nature have only to be discovered, and man will no longer be responsible for his actions, and it will become extremely easy for him to live his life. All human actions, of course, will then have to be worked out by those laws, mathematically, like a table of logarithms, and entered in the almanac; or better still, there will appear orthodox publications, something like our encyclopaedic dictionaries, in which everything will be so accurately calculated and plotted that there will no longer be any individual deeds or adventures left in the world. ‘Then,’ (this is all of you speaking), ‘a new political economy will come into existence, all complete, and also calculated with mathematical accuracy, so that all problems will vanish in the twinkling of an eye, simply because all possible answers to them will have been supplied. Then the Palace of Crystal will arise. Then….’ Well, in short, the golden age will come again. Of course it is quite impossible (here I am speaking myself) to guarantee that it won’t be terribly boring then (because what can one do if everything has been plotted out and tabulated?), but on the other hand be eminently sensible.
I categorically reject the notion that LMMs (Large Markov Models) can ever be aware or intelligent. Comparing weighted next-word-engines to feeling, thinking, aware beings is insulting.
Alright then you should also consider not basing it off of whether you find the concept insulting. It doesn't seem to be the strongest rebuttal available.
> Comparing weighted next-word-engines to feeling, thinking, aware beings is insulting
Why is it reasonable to be so reductionist about e.g. GPT-4 but not be so reductionist about a biological brain? E.g., why can't I say that your brain is nothing but a bunch of biological neurons trained using its input and intialized based on your genetics? It's equally true, and equally missing the point.
I think that machine learning probably can produce something akin to a brain, but LLMs are not really it even if they use the digital equivalent of a neuron. As much as I understand what I read about LLMs they really seem to be descendants of Markov chains. I think they are valuable and can go a long way, but LLMs themselves will not be "it". I think that we will get to a ceiling with them within 10 years if we will not think about something else. I think the ceiling can be made pretty high though.
However most probably in 10 years we will all laugh how all of our predictions missed by a long shot.
LMM = Large Markov Model. I use that term because models like GPT-4 and friends are for all intents and purposes Markov chains with more data, more compute, some lossy compression, and a bit of nearest neighbor search. Next-word-engines.
> why can't I say that your brain is nothing but a bunch of biological neurons trained using its input and intialized based on your genetics?
Because we don't think one word at a time, and we don't restart from scratch for every subsequent word.
In what sense does an LLM think one word at a time that doesn't also apply to a person typing at a keyboard? I'm typing one word at a time right now, I assume you aren't about to declare me a markov chain. When I read my brain presumably ingests one word at a time (not sure if it's one exactly, but it can't be much more than one). It is of course true that I have some notion of what I'm going to say before I right the first word, but seemingly so does an LLM.
If it was truly thinking one word at a time, it wouldn't be able to consistently use 'an' vs 'a' correctly, for example.
>we don't restart from scratch for every subsequent word.
LLMs don't restart from scratch for every word, via the attention heads they can look back through the entire context. Otherwise the memory required for inference wouldn't scale with the context length.
> In what sense does an LLM think one word at a time that doesn't also apply to a person typing at a keyboard?
Because you already have the thought formed before you started typing.
> When I read my brain presumably ingests one word at a time (not sure if it's one exactly, but it can't be much more than one)
And these models ingest many vectors at once, up to the context length. Your brain is also recursive, and regularly goes backwards to rescan earlier words as necessary.
Seems to me it's fundamentally inverted from how we operate, both input and output.
>Because you already have the thought formed before you started typing.
Can you prove that GPT-4 doesn't? Clearly there is a sense in which thinks more than one word ahead, since as I mentioned above it would not otherwise be able to use 'a' vs 'an' correctly.
As far as I am aware, exactly to what extent these models have determined what tokens will be generated before they produce anything is an open question in mechanistic interpratability research. I would be very interested if you knew of some work that answers this question empirically.
Then it's a markov-like with state. Or as I've taken to calling them lately Markov+state. (I couldn't resist, sorry.)
A truck towing a trailer isn't just a car because it pivots in the middle and has more wheels. It's fundamentals of operation are still closer to a car or truck without trailer than a bicycle.
Humans can form thoughts and get to mostly correct answers even as a gut feeling, and the language to explain why/how need not even be present. We don't form thoughts one word at a time.
No it is not Markov-like. GPT models are not Markov processes by definition. They take into account all previous words in the sequence when generating the next word. They have a type of memory in the form of an attention mechanism that refers to multiple previous states when generating tokens.
They are not human-like and they are not Markov-like. GPT is a separate category.
Dismiss it as the opinions of "a Googler" but it is entirely true. The seemingly coordinated worldwide[1] push to keep it in the hands of the power class speaks for itself.
Both are seemingly seeking to control not only the commercial use and wide distribution of such systems, but even writing them and personal use. This will keep even the knowledge of such systems and their capabilities in the shadows, ripe for abuse laundered through black box functions.
This is up there with the battle for encryption in ensuring a more human future. Don't lose it.