> yes we're using copyrighted works, but There’s no law against “using” copyrigh...

diffeomorphism · on Sept 8, 2023

> For example, it certainly doesn’t “copy” the things it’s trained on by any legal definition.

What about pictures still containing watermarks? Regardless of the actual legality, this does not fit "certainly".

> The most analogous situation to what transformer models do is a person learning from experience and creating their own work _influenced_ by what they’ve observed

No, it is not. It is called machine "learning" so clearly that is a fly made out of butter. Maybe courts will agree, maybe they won't, but the analogy to human learning is quite strenuous at best.

semiquaver · on Sept 8, 2023

Here is the section of title 17 that defines the rights of copyright holders and what terms like “copy” mean in US law. It’s clear as mud but I feel it’s likely that the process of training neural network weights is not going to be held as equivalent to verbatim digital copies. It’s just not the same thing and the law has no clear provision for it, except by analogy to existing human creative processes.

https://www.copyright.gov/title17/92chap1.html#106A

The most closely applicable existing law is that of “derivative works” but those require human authorship, so it’s far from clear that those would apply to AI output either. Ultimately this is going to be hashed out in the courts until some actual laws are written to deal with it.

(IANAL)

heavyset_go · on Sept 8, 2023

> It’s clear as mud but I feel it’s likely that the process of training neural network weights is not going to be held as equivalent to verbatim digital copies.

It's taking verbatim digital copies and using a form of lossy compression to transform them, which I think is clear when looking at things like auto-encoders.

semiquaver · on Sept 8, 2023

Isn’t your brain doing the same thing when it reads text or views a painting? Some people can even memorize and precisely recreate the things they’ve seen. But no one considers the process of lossy storage in human memory to be copyright infringement. Instead the later reproduction itself might be infringing. I think it will be the same here. Training models on copyrighted content won’t fall afoul of any existing law, instead legal challenges will have to be aimed at specific instances where the models produce output that arguably infringes copyright.

That’s inconvenient for opponents of this technology because they would prefer to ban the training itself, but there’s not a good justification under existing law to do this.