> Learning from copyrighted works to create new ones has never been protected by copyright
The term "learning" (I presume from "machine learning") shoulders a lot of weight. If we describe the situation more precisely, it involves commercially exploiting literature and other text media to produce a statistical corpus of texts, which is then commercially exploited. It's okay if that is licensed, but none of the AI companies bothered to license said original texts. Some (allegedly) just downloaded torrents of books, which is clear as day piracy. It has little to do with "learning" as used in common English — a person naturally retaining some knowledge of what they've consumed. Plain English "learning" doesn't describe the whole of what's happening with LLMs at all! It's a borrowed term, so let's not pretend it isn't.
What's happening is closer to buying some music cassettes, ripping parts of songs off them into various mixtapes, and selling them. The fact that the new cassettes "learned" the contents of the old ones, or that the songs are now jumbled up, doesn't change that the mixtape maker never had a license to copy the bits of music for commercial exploitation in the first place. After the infringement is done, the rest is smoke and mirrors...
>The term "learning" (I presume from "machine learning") shoulders a lot of weight. If we describe the situation more precisely, it involves commercially exploiting literature and other text media to produce a statistical corpus of texts, which is then commercially exploited.
It's "commercially exploiting literature" in the same sense that an author would if they read a bunch of novels and then wrote their own based on what the learned from the pre-existing text. The whole point in dispute is whether that turns into infringement when an AI does it.
By labeling only one of them as "commercially exploiting literature" but not the other, you're failing to distinguish them in any meaningful way, and basically arguing from name-calling.
>It has little to do with "learning" as used in common English — a person naturally retaining some knowledge of what they've consumed. Plain English "learning" doesn't describe the whole of what's happening with LLMs at all! It's a borrowed term, so let's not pretend it isn't.
That's fair, that you can't just call them both "learning" and call it a day. But then the burden's on you to show how machine learning breaks from the time-honored tradition of license-free learning/"updating what you write based on having viewed other works". What's different? What is it about machine learning that makes it infringement in a way that it isn't when humans update their weights from having seen copyrighted works?
>What's happening is closer to buying some music cassettes, ripping parts of songs off them into various mixtapes, and selling them. The fact that the new cassettes "learned" the contents of the old ones, or that the songs are now jumbled up, doesn't change that the mixtape maker never had a license to copy the bits of music for commercial exploitation in the first place.
Okay, but (as above) to make that case, you'd need to identify where "acceptable" learning/"updating what you write based on having viewed other works" crosses over into the infringing mixtape example, and I have yet to see anyone try beyond "they're evil corps, it must be bad somehow".
The term "learning" (I presume from "machine learning") shoulders a lot of weight. If we describe the situation more precisely, it involves commercially exploiting literature and other text media to produce a statistical corpus of texts, which is then commercially exploited. It's okay if that is licensed, but none of the AI companies bothered to license said original texts. Some (allegedly) just downloaded torrents of books, which is clear as day piracy. It has little to do with "learning" as used in common English — a person naturally retaining some knowledge of what they've consumed. Plain English "learning" doesn't describe the whole of what's happening with LLMs at all! It's a borrowed term, so let's not pretend it isn't.
What's happening is closer to buying some music cassettes, ripping parts of songs off them into various mixtapes, and selling them. The fact that the new cassettes "learned" the contents of the old ones, or that the songs are now jumbled up, doesn't change that the mixtape maker never had a license to copy the bits of music for commercial exploitation in the first place. After the infringement is done, the rest is smoke and mirrors...