No it's more like they are trying to predict what some given human might say(#amongst other things).
a pretrained transformer in the limit does not converge on any collective or consensus state in that sense and in fact, pre-training actually punishes this. It learns to predict the words of Feynman as readily as the dumbass across the street.
When i say that GPT does not mimic, i mean that the training objective literally optimizes for beyond that.
Consider <Hash, plaintext> pairs. You can't predict this without cracking the hash algorithm, but you could easily fool a GAN's discriminator(one that has learnt to compute hash functions) just by generating typical instances.
# Consider that some of the text on the Internet isn't humans casually chatting or extemporaneous speech. It's the results section of a science paper. It's news stories that say what happened on a particular day. It's text that people crafted over hours or days.
a pretrained transformer in the limit does not converge on any collective or consensus state in that sense and in fact, pre-training actually punishes this. It learns to predict the words of Feynman as readily as the dumbass across the street.
When i say that GPT does not mimic, i mean that the training objective literally optimizes for beyond that.
Consider <Hash, plaintext> pairs. You can't predict this without cracking the hash algorithm, but you could easily fool a GAN's discriminator(one that has learnt to compute hash functions) just by generating typical instances.
# Consider that some of the text on the Internet isn't humans casually chatting or extemporaneous speech. It's the results section of a science paper. It's news stories that say what happened on a particular day. It's text that people crafted over hours or days.