I asked this in the other thread (no response, but I was a bit late)
How does anyone using AI like this have confidence that they aren't unintentionally plagiarizing code and violating the terms of whatever license it was released under?
For random personal projects I don't see it mattering that much. But if a large corp is releasing code like this, one would hope they've done some due diligence that they have to just stolen the code from some similar repo on GitHub, laundered through a LLM.
The only section in the readme doesn't mention checking similar projects or libraries for common code:
> Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.
> How does anyone using AI like this have confidence that they aren't unintentionally plagiarizing code and violating the terms of whatever license it was released under?
Most of the code generated by LLMs, and especially the code you actually keep from an agent, is mid, replacement-level, boring stuff. If you're not already building projects with LLMs, I think you need to start doing that first before you develop a strong take on this. From what I see in my own work, the code being generated is highly unlikely to be distinguishable. There is more of me and my prompts and decisions in the LLM code than there can possibly be defensible IPR from anybody else, unless the very notion of, like, wrapping a SQLite INSERT statement in Golang is defensible.
The best way I can explain the experience of working with an LLM agent right now is that it is like if every API in the world had a magic "examples" generator that always included whatever it was you were trying to do (so long as what you were trying to do was within the obvious remit of the library).
Safety in the shadow of giant tech companies. People were upset when Microsoft released Copilot trained on GitHub data, but nobody who cared doing do anything about it, and nobody who could have done something about it cared, so it just became the new norm.
All of the big LLM vendors have a "copyright shield" indemnity clause for their paying customers - a guarantee that if you get sued over IP for output from their models their legal team will step in to fight on your behalf.
This is an excellent question that the AI-boosters always seem to dance around. Three replies already are saying “Nobody cares.” Until they do. I’d be willing to bet that some time in the near future, some big company is going to care a lot and that there will be a landmark lawsuit that significantly changes the LLM landscape. Regulation or a judge is going to eventually decide the extent to which someone can use AI to copy someone else’s IP, and it’s not going to be pretty.
It just presumes a level of fixation in copyright law that I don’t think is realistic. There was a landmark lawsuit MAI v. Peak Computer in 1993, where judges determined that repairing a computer without the permission of the operating system’s author is copyright infringement, and it didn’t change the landscape at all because everyone immediately realized it’s not practical for things to work that way. There’s no realistic world where AI tools end up being extremely useful but nobody uses them because of a court ruling.
I'm fairly confident that it's not just plagiarizing because I asked the LLM to implement a novel interface with unusual semantics. I then prompted for many specific fine-grain changes to implement features the way I wanted. It seems entirely implausible to me that there could exist prior art that happened to be structured exactly the way I requested.
Note that I came into this project believing that LLMs were plagiarism engines -- I was looking for that! I ended up concluding that this view was not consistent with the output I was actually seeing.
The consensus for right or wrong, is that LLM produced code (unless repeated verbatim) is equivalent to you or I legitimately stating our novel understanding of mixed sources some of which may be copyrighted.
How does anyone using AI like this have confidence that they aren't unintentionally plagiarizing code and violating the terms of whatever license it was released under?
For random personal projects I don't see it mattering that much. But if a large corp is releasing code like this, one would hope they've done some due diligence that they have to just stolen the code from some similar repo on GitHub, laundered through a LLM.
The only section in the readme doesn't mention checking similar projects or libraries for common code:
> Every line was thoroughly reviewed and cross-referenced with relevant RFCs, by security experts with previous experience with those RFCs.