That wouldn't matter too much though - how often do you worry about competitors directly stealing your code? Either it's server-side, or it's obfuscated or it's compiled. Anyway there's never that much stuff that's so special that it needs big legal stuff to prevent it from being copied, and if the LLM produces it you can just use another LLM to copy the same feature. And say it's 99% LLM and 1% human, who's going to know what the 1% is that's not safe to copy?
You probably misunderstood how "infection" of GPL works. (which is very common)
If your close-sourced project uses some GPL code, it doesn't automatically put your whole project in public domain or under GPL. It just means you're infringing the right of the code author and they can sue you (for money and stopping using their code, not for making your whole project GPL).
In the simplest terms, GPL is:
if codebase.is_gpl_compitable:
gpl_code.give_permission(code_base)
else if codebase.is_using(gpl_code):
throw new COPYRIGHT_INFRINGEMENT // the copyright owner and the court deal with that with usual copyright laws
GPL can't do much more than that. A license over a piece of code cannot automatically change the copyright status of another piece of code. There simply isn't legal framework for that.
Similarly, AI code's copyleft status can't affect the rest of the codebase, unless we make new laws specifically saying that.
Also similarly, even if Github lost the class action, it will NOT automatically release the model behind GPL to the public. It will open the possibility for all the GPL repo authors to ask Microsoft for compensation for stealing their code.
It just means you're infringing the right of the code author and they can sue you (for money and stopping using their code, not for making your whole project GPL).
they can sue you and settle for whatever you will accept that makes them happy.
if you lose then the alternative to not making your code GPL is to make your code disappear, that is you are no longer allowed to sell your product.
consequently, if AI code is subject to the GPL then the rest of the codebase is too, or the alternative would be that the could not be distributed.
First of all, pure AI-generated code is uncopyrightable now. Uncopyrightable code can't be under GPL.
Secondly, GPL can't "make your (proprietary) code disappear." Violating GPL is essentially just stealing code. One cannot distribute the version that includes stolen code. But they can remove the stolen part and replace it with their own code. Of course they still need to settle/pay for the previous infringement.
GPL simply can't affect the copyright status of rest of the codebase, because it's a license, not a contract. It cannot restrict the user's right further than the copyright laws.
Again, it's very common misunderstanding of GPL's "virality." It has been a several-decade long debate about whether GPL should be treated like a contract instead of a mere license, but there is no ruling giving it this special legal state (yet), at least in the US.
First of all, pure AI-generated code is uncopyrightable now. Uncopyrightable code can't be under GPL.
if AI generates something that is equal to existing code, then the license of that code applies. the AI generated product as a whole can't be copyrighted, but the portions that reproduce copyrighted code retain the original copyright.
they can remove the stolen part and replace it with their own code
sure, if they can do that, then they can distribute their code again. but until then they can't.
> if AI generates something that is equal to existing code, then the license of that code applies.
No, it doesn't, if the generation is independent of the existing code. If a person using AI uses existing code and makes a literal copy of it, then, yes, the copyright (and any license offer applicable in the circumstances) of the existing code may apply (it may also not, the same as with copies of portions of code made by other means), and it's less than clear if (especially for small portions of code) that legally such a copy has been made when a work is in the training set.
Copyright protects against copying. It doesn't protect against someone creating the same content by means other than copying.
btw, this is the reason why people who at some point of time may have had access to windows source code are not allowed to work on wine. because if wine accidentally reproduces windows code, the only defense is that none of the contributors have ever seen that code before.
if AI has seen that code in training, then this defense is no longer possible.
No, almost cerainly it would be practically impossible if you reproduced the entire work, on top of evidence that you had perused it, because it would be very hard to convince a trier of fact that the duplication really was coincidence rather than copying, but it might be a very different story if you had read Harry Potter and then wrote another work that includes the text “Up!” she screeched. (which appears verbatim in the first volume of the series.)
You can use GPL code in proprietary code. You just can't distribute said proprietary code if you don't also distribute its sources in accordance with the GPL, and that is how the "infection" happens.
But then we would need a way to prove that some code was LLM generated, right?
Like if I copy-paste GPL-licenced code, the way you realise that I copy-pasted it is because 1) you can see it and 2) the GPL-licenced code exists. But when code is LLM generated, it is "new". If I claim I wrote it, how would you oppose that?
The same way a current product advertises itself as hand-crafted, or organic, etc. Just pure trust. And if you can charge a higher price because this is considered valuable by your customers, then you get to sustain it as a business.