And the current norm that the trillion dollar companies have lobbied for is that you can train on copyrighted material all you want so that's the reality we are living in. Everything ever published is all theirs.
>And the current norm that the trillion dollar companies have lobbied for is that you can train on copyrighted material all you want so that's the reality we are living in. Everything ever published is all theirs.
What "lobbied"? Copyright law hasn't materially changed since AI got popular, so I'm not sure where these lobbying efforts are showing up in. If anything the companies that have lobbied hard in the past (eg. media companies) are opposed to the current status quo, which seems to favor AI companies.
I am really surprised that media businesses, which are extremely influential around the world, have not pushed back against this more. I wonder whether they are looking at cost savings that will get from the technology as a worthwhile trade-off.
Yeah, the short term win is to enter a licensing agreement so you get some cash for a couple years, meanwhile pray someone else with more money fights the legal battle to try and set a precedent for you
To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.
Once training is established as fair use, it doesn't really matter if the license is MIT, GPL, or a proprietary one.
Great, so the US and China can duke it out trying to create AGI or whatever, whereas most other countries are stuck in the past because of their copyright laws?
France and most of europe has fair use (https://fr.wikipedia.org/wiki/Copie_priv%C3%A9e) but also has a mandatory tax on every sold medium that can do storage to recover the "lost fees" due to fair use
Is that not just an exemption for copying for private use? My french is not up to much but this:
> L'exception de copie privée autorise une personne à reproduire une œuvre de l'esprit pour son usage privé, ce qui implique l'utilisation personnelle, mais également dans le cercle privé incluant le cadre familial.
seems to be only for personal use?
Fair dealing in the UK and other countries is broader, and US fair use broader still.
> To my understanding, if the material is publicly available or obtained legally (i.e., not pirated), then training a model with it falls under fair use.
That is just the sort of point I am trying to make. That is a copyright law issue, not a contractual one. If the GPL is a contract then you are in breach of contract regardless of fair use or equivalents.
It's not specific to open source but it's most clearly enforceable with open source as there will be many contributors from many jurisdictions with the one unifying factor being they all made their copyright available under the same license terms.
With proprietary or more importantly single-owner code, it's far easier for this to end up in a settlement rather than being drug out into an actual ruling, enforcement action, and establishment of precedence.
That's the key detail. It's not specific to GPL or open source but if you want to see these orgs held to account and some precedence established, focusing on GPL and FOSS licensed code is the clearest path to that.
That part of the article is about US cases, so its US law that applies.
> A GPL license is a contract in most other countries. Just not US probably.
Not just the US. It may vary with version of the GPL too. Wikipedia claims its a civil law vs common law country difference - not sure the citation shows that though.
A lot of it boils down to whether training an LLM is a breach of copyright of the training materials which is not specific to GPL or open source.