Simply put, if the model isn’t producing an actual copy, they aren’t violating copyright (in the US) under any current definition.
As much as people bandy the term around, copyright has never applied to input, and the output of a tool is the responsibility of the end user.
If I use a copy machine to reproduce your copyrighted work, I am responsible for that infringement not Xerox.
If I coax your copyrighted work out of my phones keyboard suggestion engine letter by letter, and publish it, it’s still me infringing on your copyright, not Apple.
If I make a copy of your clip art in Illustratator, is Adobe responsible? Etc.
Even if (as I’ve seen argued ad nauseaum) a model was trained on copyrighted works on a piracy website, the copyright holder’s tort would be with the source of the infringing distribution, not the people who read the material.
Not to mention, I can walk into any public library and learn something from any book there, would I then owe the authors of the books I learned from a fee to apply that knowledge?
> the copyright holder’s tort would be with the source of the infringing distribution, not the people who read the material.
Someone who just reads the material doesn't infringe. But someone who copies it, or prepares works that are derivative of it (which can happen even if they don't copy a single word or phrase literally), does.
> would I then owe the authors of the books I learned from a fee to apply that knowledge?
Facts can't be copyrighted, so applying the facts you learned is free, but creative works are generally copyrighted. If you write your own book inspired by a book you read, that can be copyright infringement (see The Wind Done Gone). If you use even a tiny fragment of someone else's work in your own, even if not consciously, that can be copyright infringement (see My Sweet Lord).
Right, but the onus of responsibility being on the end user publishing the song or creative work in violation of copyright, not the text editor, word processor, musical notation software, etc, correct?
A text prediction tool isn’t a person, the data it is trained on is irrelevant to the copyright infringement perpetrated by the end user. They should perform due diligence to prevent liability.
> A text prediction tool isn’t a person, the data it is trained on is irrelevant to the copyright infringement perpetrated by the end user. They should perform due diligence to prevent liability.
Huh what? If a program "predicts" some data that is a derivative work of some copyrighted work (that the end user did not input), then ipso facto the tool itself is a derivative work of that copyrighted work, and illegal to distribute without permission. (Does that mean it's also illegal to publish and redistribute the brain of a human who's memorised a copyrighted work? Probably. I don't have a problem with that). How can it possibly be the user's responsibility when the user has never seen the copyrighted work being infringed on, only the software maker has?
And if you say that OpenAI isn't distributing their program but just offering it as a service, then we're back to the original situation: in that case OpenAI is illegally distributing derivative works of copyrighted works without permission. It's not even a YouTube like situation where some user uploaded the copyrighted work and they're just distributing it; OpenAI added the pirated books themselves.
If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
You learned English, math, social studies, science, business, engineering, humanities, from a McGraw Hill textbook? Sorry, all creative works you’ve produced are derivative of your educational materials copyrighted by the authors and publisher.
> If the output of a mathematical model trained on an aggregate of knowledge that contains copyrighted material is derivative and infringing, then ipso facto, all works since the inception of copyright are derivative and infringing.
I'm not saying every LLM output is necessarily infringing, I'm saying that some are, which means the underlying LLM (considered as a work on its own) must be. If you ask a human to come up with some copy for your magazine ad, they might produce something original, or they might produce something that rips off a copyrighted thing they read. That means that the human themselves must contain enough knowledge of the original to be infringing copyright, if the human was a product you could copy and distribute. It doesn't mean that everything the human produces infringes that copyright.
(Also, humans are capable of original thought of their own - after all, humans created those textbooks in the first place - so even if a human produces something that matches something that was in a textbook, they may have produced it independently. Whereas we know the LLM has read pirated copies of all the textbooks, so that defense is not available)
You are saying that, any output is possibly infringing, dependandant on the input. This is actually, factually, verifiably, false, in terms of current copyright law.
No human, in the current epoch of education where copyright has been applicable, has learned, benefited, or exclusively created anything behreft of copyright. Please provide a proof otherwise if you truly believe so.
> You are saying that, any output is possibly infringing, dependandant on the input.
What? No. How did you get that from what I wrote? Please engage with the argument I'm actually making, not some imaginary different argument that you're making up.
> No human, in the current epoch of education where copyright has been applicable, has learned, benefited, or exclusively created anything behreft of copyright.
I do appreciate your point because it's one of the interesting side effects of AI to me. Revealing just how much we humans are a stack of inductive reasoning and not-actually-free-willed rehash of all that came before.
Of course, humans are also "trained" on their lived sensory experiences. Most people learn more about ballistics by playing catch than reading a textbook.
When it comes to copyright I don't think the point changes much. See the sibling comments which discuss constructive infringement and liability. Also, it's normal for us to have different rules for humans vs machines / corporations. And scale matters -- a single human just isn't capable of doing what the LLM can. Playing a record for your friends at home isn't a "performance", but playing it to a concert hall audience of thousands is.
My point isn’t adversarial, we most likely (in my most humble opinion) “learn” the same way as anything learns. That is to say, we are not unique in terms of understanding, “understandings”.
Are the ballistics we learn by physical interaction any different from the factual learning of ballistics that, for example, a squirrel learns, from their physical interactions?
Those software tools don't generate content the way an LLM does so they aren't particularly relevant.
It's more like if I hire a firm to write a book for me and they produce a derivative work. Both of us have a responsibility for guard against that.
Unfortunately there is no definitive way to tell if something is sufficiently transformative or not. It's going to come down to the subjective opinion of a court.
Copyright law is pretty clear on commissioned work, you are the holder, if your employee violated copyright and you failed to do your due diligence before publication, then you are responsible. If your employee violated copyright and fraudulently presented the work as original to you then you would seek compensation from them.
> Copyright law is pretty clear on commissioned work, you are the holder, if your employee violated copyright and you failed to do your due diligence before publication, then you are responsible.
No, for commissioned work in the usual sense the person you commissioned from is the copyright holder; you might have them transfer the copyright to you as part of your contract with them but it doesn't happen by default. It is in no way your responsibility to "do due diligence" on something you commissioned from someone, it is their responsibility to produce original work and/or appropriately license anything they based their work on. If your employee violates copyright in the course of working for you then you might be responsible for that, but that's for the same reason that you might be responsible for any other crimes your employee might commit in the course of working for you, not because you have some special copyright-specific responsibility.
You mean the author. The creator of a commissioned work is the author under copyright law, the owner or copyright “holder” is the commissioner of the work or employer of the employee that created the work as a part of their job.
The author may contractually retain copyright ownership per written agreement prior to creation, but this is not the default condition for commissioned, “specially ordered”, works, or works created by an employee in the process of their employment.
The only way an employer/commissioner would be responsible (vicarious liability) for copyright infringement of a commissioned work or work produced by an employee would be if you instructed them to do so or published the work without performing the duty of due diligence to ensure originality.
> The creator of a commissioned work is the author under copyright law, the owner or copyright “holder” is the commissioner of the work or employer of the employee that created the work as a part of their job.
Nope. In cases where work for hire does apply (such as an employee preparing a work as part of their employment), the employer holds the copyright because they are considered as the author. But a work that's commissioned in the usual way (i.e. to a non-employee) is not a work-for-hire by default, in many cases cannot be a work-for-hire at all, and is certainly not a work-for-hire without written agreement that it is.
> The author may contractually retain copyright ownership per written agreement prior to creation, but this is not the default condition for commissioned, “specially ordered”, works
Nope. You must've misread this part of the law. A non-employee creator retains copyright ownership unless the work is commissioned and there is a written agreement that it is a work for hire before it is created (and it meets the categories for this to be possible at all).
> The only way an employer/commissioner would be responsible (vicarious liability) for copyright infringement of a commissioned work or work produced by an employee
What are you even trying to argue at this point? You've flipped to claiming the opposite of what you were claiming when I replied.
> duty of due diligence to ensure originality
This is just not a thing, not a legal concept that exists at all, and a moment's thought will show how impossible it would be to ever do. When someone infringes copyright, that person is liable for that copyright infringement. Not some other person who commissioned that first person to make something for them. That would be insane.
"(2) a work specially ordered or commissioned for use as a contribution to a collective work, as a part of a motion picture or other audiovisual work, as a translation, as a supplementary work, as a compilation, as an instructional text, as a test, as answer material for a test, or as an atlas, if the parties expressly agree in a written instrument signed by them that the work shall be considered a work made for hire. For the purpose of the foregoing sentence, a “supplementary work” is a work prepared for publication as a secondary adjunct to a work by another author for the purpose of introducing, concluding, illustrating, explaining, revising, commenting upon, or assisting in the use of the other work, such as forewords, afterwords, pictorial illustrations, maps, charts, tables, editorial notes, musical arrangements, answer material for tests, bibliographies, appendixes, and indexes, and an “instructional text” is a literary, pictorial, or graphic work prepared for publication and with the purpose of use in systematic instructional activities.
In determining whether any work is eligible to be considered a work made for hire under paragraph (2), neither the amendment contained in section 1011(d) of the Intellectual Property and Communications Omnibus Reform Act of 1999, as enacted by section 1000(a)(9) of Public Law 106–113, nor the deletion of the words added by that amendment—
(A) shall be considered or otherwise given any legal significance, or
(B) shall be interpreted to indicate congressional approval or disapproval of, or acquiescence in, any judicial determination,
by the courts or the Copyright Office. Paragraph (2) shall be interpreted as if both section 2(a)(1) of the Work Made For Hire and Copyright Corrections Act of 2000 and section 1011(d) of the Intellectual Property and Communications Omnibus Reform Act of 1999, as enacted by section 1000(a)(9) of Public Law 106–113, were never enacted, and without regard to any inaction or awareness by the Congress at any time of any judicial determinations."
Now your turn, quote the full passage of whatever law you think creates this "duty of due diligence" that you've been talking about.
>In the case of a work made for hire, the employer or other person for whom the work was prepared is considered the author for purposes of this title, and, unless the parties have expressly agreed otherwise in a written instrument signed by them, owns all of the rights comprised in the copyright.
You are responsible for infringing works you publish, whether they are produced by commission or employee.
Due diligence refers to the reasonable care, investigation, or steps that a person or entity is expected to take before entering into a contract, transaction, or situation that carries potential risks or liabilities.
Vicarious copyright infringement is based on respondeat superior, a common law principle that holds employers legally responsible for the acts of an employee, if such acts are within the scope and nature of the employment.
You haven't quoted anything about this supposed "duty of due diligence" which is what I asked for.
> In the case of a work made for hire...
Per what I quoted in my last post, commissioned works in the usual sense are not normally "works made for hire" so none of that applies.
> respondeat superior, a common law principle that holds employers legally responsible for the acts of an employee, if such acts are within the scope and nature of the employment.
i.e. exactly what I said a couple of posts back: "If your employee violates copyright in the course of working for you then you might be responsible for that, but that's for the same reason that you might be responsible for any other crimes your employee might commit in the course of working for you, not because you have some special copyright-specific responsibility."
How is the end user the one doing the infringement though? If I chat with ChatGPT and tell it „give me the first chapter of book XYZ“ and it gives me the text of the first chapter, OpenAI is distributing a copyrighted work without permission.
If that’s the case, then sure, as I said in the first sentence of my comment, verbatim copies of copyrighted works would most likely constitute infringement.
> As much as people bandy the term around, copyright has never applied to input, and the output of a tool is the responsibility of the end user.
Where this breaks down though is that contributory infringement is a still a thing if you offer a service aids in copyright infringement and you don't do "enough" to stop it.
Ie, it would all be on the end user for folks that self host or rent hardware and run an LLM or Gen Art AI model themselves. But folks that offer a consumer level end to end service like ChatGPT or MidJourney could be on the hook.
Right, strictly speaking, the vast majority of copyright infringement falls under liability tort.
There are cases where infringement by negligence that could be argued, but as long as there is clear effort to prevent copying in the output of the tool, then there is no tort.
If the models are creating copies inadvertently and separately from the efforts of the end users deliberate efforts then yes, the creators of the tool would likely be the responsible party for infringement.
If I ask an LLM for a story about vampires and the model spits out The Twilight Saga, that would be problematic. Nor should the model reproduce the story word for word on demand by the end user. But it seems like neither of these examples are likely outcomes with current models.
The piratebay crew was convicted of aiding copyright infringement. In that case you could not download derivates from their service. Now you can get verbatim text from the models that any other traditional publisher would have to pay license to print even a reworded copy of.
With that said, Creative Commons showed that copyright can not be fixed it is broken.
As much as people bandy the term around, copyright has never applied to input, and the output of a tool is the responsibility of the end user.
If I use a copy machine to reproduce your copyrighted work, I am responsible for that infringement not Xerox.
If I coax your copyrighted work out of my phones keyboard suggestion engine letter by letter, and publish it, it’s still me infringing on your copyright, not Apple.
If I make a copy of your clip art in Illustratator, is Adobe responsible? Etc.
Even if (as I’ve seen argued ad nauseaum) a model was trained on copyrighted works on a piracy website, the copyright holder’s tort would be with the source of the infringing distribution, not the people who read the material.
Not to mention, I can walk into any public library and learn something from any book there, would I then owe the authors of the books I learned from a fee to apply that knowledge?