> ChatGPT is just a hyperfancy text generator with a massive corpus of text used to train it.
> it just compares weights internally and spits out the most probable result depending
While an autoregressive language model can use probabilities from prompts provided by humans, it is not necessary for the model to do so. The model can also generate text based on its own internal state and previous generated tokens.
The latter is what you are presenting, but you and the other stochastic parrot people are missing the fact the model spends time doing the former much more frequently, especially now we have these models "hooked up" to the Internet. At Mitta.us, I've had GPT-3 discussing web pages and PDFs for over a year now, with memories of previous discussions about similar things.
LLMs have their internal (frozen) model, the corpus they are being shown (likely shredded into fragments and embedded with ada-002) and previous interactions with users and themselves. At any point someone implements memories and good search (which they have) then you have a thing that is not so much a parrot as it is an "attention entity" capable of focusing on a thing and discussing it at length.
This doesn't mean during inference that the model is "aware" of anything other than producing probabilities, but given the input is unique (user prompt + neural search for fragments + memories) then the output will also be unique. That unique output may be probabilistic, but that is no different than the way we work when we begin speaking.
> it just compares weights internally and spits out the most probable result depending
While an autoregressive language model can use probabilities from prompts provided by humans, it is not necessary for the model to do so. The model can also generate text based on its own internal state and previous generated tokens.
The latter is what you are presenting, but you and the other stochastic parrot people are missing the fact the model spends time doing the former much more frequently, especially now we have these models "hooked up" to the Internet. At Mitta.us, I've had GPT-3 discussing web pages and PDFs for over a year now, with memories of previous discussions about similar things.
LLMs have their internal (frozen) model, the corpus they are being shown (likely shredded into fragments and embedded with ada-002) and previous interactions with users and themselves. At any point someone implements memories and good search (which they have) then you have a thing that is not so much a parrot as it is an "attention entity" capable of focusing on a thing and discussing it at length.
This doesn't mean during inference that the model is "aware" of anything other than producing probabilities, but given the input is unique (user prompt + neural search for fragments + memories) then the output will also be unique. That unique output may be probabilistic, but that is no different than the way we work when we begin speaking.