Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> yes we're using copyrighted works, but

There’s no law against “using” copyrighted works, there is a law against copying and distributing them.

Fair use analysis doesn’t come into play unless we’re dealing with clearly established copyright infringement. What LLMs do doesn’t clearly qualify as any of the behaviors reserved to copyright owners. For example, it certainly doesn’t “copy” the things it’s trained on by any legal definition.

Law works on precedent and analogy when there’s no clearly on-point statutes or case law. The most analogous situation to what transformer models do is a person learning from experience and creating their own work _influenced_ by what they’ve observed. That behavior is not copyright infringement by any stretch of the imagination. The fact that it’s done with a computer is not as important as people seem to think it is.



> For example, it certainly doesn’t “copy” the things it’s trained on by any legal definition.

What about pictures still containing watermarks? Regardless of the actual legality, this does not fit "certainly".

> The most analogous situation to what transformer models do is a person learning from experience and creating their own work _influenced_ by what they’ve observed

No, it is not. It is called machine "learning" so clearly that is a fly made out of butter. Maybe courts will agree, maybe they won't, but the analogy to human learning is quite strenuous at best.


Here is the section of title 17 that defines the rights of copyright holders and what terms like “copy” mean in US law. It’s clear as mud but I feel it’s likely that the process of training neural network weights is not going to be held as equivalent to verbatim digital copies. It’s just not the same thing and the law has no clear provision for it, except by analogy to existing human creative processes.

https://www.copyright.gov/title17/92chap1.html#106A

The most closely applicable existing law is that of “derivative works” but those require human authorship, so it’s far from clear that those would apply to AI output either. Ultimately this is going to be hashed out in the courts until some actual laws are written to deal with it.

(IANAL)


> It’s clear as mud but I feel it’s likely that the process of training neural network weights is not going to be held as equivalent to verbatim digital copies.

It's taking verbatim digital copies and using a form of lossy compression to transform them, which I think is clear when looking at things like auto-encoders.


Isn’t your brain doing the same thing when it reads text or views a painting? Some people can even memorize and precisely recreate the things they’ve seen. But no one considers the process of lossy storage in human memory to be copyright infringement. Instead the later reproduction itself might be infringing. I think it will be the same here. Training models on copyrighted content won’t fall afoul of any existing law, instead legal challenges will have to be aimed at specific instances where the models produce output that arguably infringes copyright.

That’s inconvenient for opponents of this technology because they would prefer to ban the training itself, but there’s not a good justification under existing law to do this.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: