Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> the people whose work has been stolen to create it

"Stolen" is kind of a loaded word. It implies the content was for sale and was taken without payment. I don't think anyone would accuse a person of stealing if they purchased GRRM's books, studied the prose and then used the knowledge they gained from studying to write a fanfic in the style of GRRM (or better yet, the final 2 books). What was stolen? "the prose style"? Seems too abstract. (yes, I know the counter argument is "but LLMs can do more quickly and at a much greater scale", and so forth)

I generally want less copyright, not more. I'm imagining a dystopian future where every article on the internet has an implicit huge legal contract you enter into like "you are allowed to read this article with your eyeballs only, possibly you are also allowed to copy/paste snippets with attribution, and I suppose you are allowed to parody it, but you aren't allowed to parody it with certain kinds of computer assistance such as feeding text into an LLM and asking it to mimic my style, and..."



AI has been trained on pirated material and that would be very different from someone buying books and reading them and learning from them. Right now it's still up to the courts what counts as infringing but at this point even Disney is accusing AI of violating their copyrights https://www.nytimes.com/2025/06/11/business/media/disney-uni...

AI outputs copyrighted material: https://www.nytimes.com/interactive/2024/01/25/business/ai-i... and they can even be ranked by the extent to which they do it: https://aibusiness.com/responsible-ai/openai-s-gpt-4-is-the-...

AI is getting better at data laundering and hiding evidence of infringement, but ultimately it's collecting and regurgitating copyrighted content.


> at this point even Disney is accusing AI of violating their copyrights

"even" is odd there, of course Disney is accusing them of violating copyright, that's what Disney does.

> AI is getting better at data laundering and hiding evidence of infringement, but ultimately it's collecting and regurgitating copyrighted content.

That's not the standard for copyright infringement; AI is a transformative use.

Similarly, if you read a book and learn English or facts about the world by doing that, the author of the book doesn't own what you just learned.


Facts aren't copyrightable. Expression is. LLMs reproduce expression from the works they were trained on. The way they are being trained involves making an unlicensed reproduction of works. Both of those are pretty straightforwardly infringement of an exclusive right.

Establishing an affirmative defense that it's transformative fair use would hopefully be an uphill battle, given that it's commercial, using the whole work, and has a detrimental effect on the market for the work.


> AI is a transformative use.

Reproducing a movie still well enough that I honestly wouldn't know which one is the original is transformative?


The still is not transformative but the model reproducing it is obviously transformative. Other general purpose tools can be used to infringe and yet are non-infringing as well.


If I watch a movie, then draw a near perfect likeness of the main character from my very good memory, put it on a tshirt and sell the t-shirt. That is grounds for violation of copyright if the source isn't yet in the public domain (not guaranteed but open to a lawsuit).

If I download all content from a website that has a use policy stating that all content is owned by that website and can't be resold. Then allow my users to query this downloaded data and receive a detailed summary of all related content, and sell that product. Perhaps this is a violation of the use policy.

All of this hasn't been properly tested in the courts yet.. large payments have already been made to Reddit to avoid this, likely because Reddit has the means to fight this in court.. my little blog though, fair game because I can't afford to engage.


For sure, it's rich people playing rules for thee not for me. What's interesting is we'll discover on which side of the can-afford-to-enforce-its-copyright boundary the likes of NYTimes fall.


That’s not “data laundering and hiding evidence of infringement” though.

You’re talking about overt infringement, the GP was talking about covert infringement. It’s difficult to see how something could be covert yet not transformative.


Stolen doesn't imply anything is for sale, does it? Most things that are stolen are not for sale.


I think there is case to be made that AI companies are taking the content - providing people with a modified version of that content and not necessarily providing references to the original material.

Much of the content that is created by people is done so to generate revenue. They are denied that revenue when people don't go to their site. One might interpret that as theft. In the case of GRRM's books - I would assumed they were purchased and the author received the revenue from the sale.


I think you are missing some context. They were using Anna's Archive! They paid for nothing, downloaded in violation of copyright, and processed it. They violated US copyright law even before they actually ingested it!


Yes, there are ethical differences to an individual doing things by hand, and a corporation funded by billions of investor dollars doing an automated version of that thing at many orders of magnitude in scale.

Also, LLMs don’t just imitate style, they can be made to reproduce certain content near-verbatim in a way that would be a copyright violation if done by a human being.

You can excuse it away if you want with reduction ad absurdum arguments, but the impact is distinctly different, and calls for different parameters.


> It implies the content was for sale and was taken without payment

that's literally what happened in innumerable individual cases, though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: