This is what's so puzzling about arguments around LLMs and copyright. If virtually anything GPT-4 generates is a copyright violation because it was trained on copyrighted material, then that would imply that virtually all art produced by humans is a copyright violation given that humans wouldn't produce art unless they were inspired (i.e. trained) by previous art. How people don't understand that practically every piece of art or entertainment they've consumed is derivative, I don't know. Originality is, for the most part, relative, and if everything a trained intelligence produces is a copyright violation, then nothing is a copyright violation; the very concept becomes not only useless but counterproductive.
After all, without Plato, would we really have had Aristotle?
The problem is the missing reference. As an artist, you don't just copy or use a style, you make the reference obvious. As an author, you include a citation, or, if it's fiction, you openly allude to the reference. This is actually why we can form a proposition like the one on Plato and Aristotle (which probably should include Socrates). The problem with the kind of "remix" and transfer of generative AI is that the reference is lost in the process. (Which is also a serious problem for any kind of verification.)
Yes, when it’s clearly a derivative, but that’s not the common case.
Mostly our understanding is derived from thousand of places and come out in different ways.
My kids learned to really read with Harry Potter. Must they pay JK Rowlings each time they formulate complex ideas, create new stories or read another book?
In the same way if the model was trained on thousand of animes pictures and now can generate in the same genre I don’t think we can apply the copyright rules there.
Humans are also pretty notoriously bad at doing this. It's also very easy to be inspired by something then forget the source. Humans often perform reverse-attribution, where they later realize or someone else points out a similarity, and only then do they add the citation.
Granted. This is why we built a rather complex set of rules and enforced habits around this and, if there's any doubt, rather dismiss a given production not adhering to these rules.
The difference seems to be in the level of fidelity of the copy. For example, we still attribute works to Socrates even though they were written down by Plato.
Some of these LLMs can produce code that is nearly identical to some file in the training set with the right prompt, similar to a human with a photographic memory. If a human for example copies a story or song, copyright law can and does consider it a copy even if produced purely from memory.
The line of originality was always somewhat arbitrary, it likely will continue to be.
After all, without Plato, would we really have had Aristotle?