Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I really appreciated the argument in the book "The New Breed," which is that we should adapt ideas around the governance of animals to governance of ML: You can train your dog to attack random passersby, but if you do, you're a monster and ultimately responsible for the dog's actions.

Likewise, you can tell Copilot to crank out code specific algorithms written by specific people, but if you do so, you're still creating infringing code, same as if you'd taken the more direct route of ctrl-c+ctrl-v. The fact that you /can/ make the algorithm misbehave through adversarial input is irrelevant to the primary use cases which lead to boring non-infringing code completions.



This just sounds like blaming the researchers to me. How would i ever know if my "boring code completion" was actually copyright infringement?

Your argument just disallows discussing the problem while doing absolutely nothing about it.

If you train your dog to NOT attack random passersby and it still does, that dog is euthanized no matter your intentions.


If you train your dog to NOT attack random passersby and it still does, that dog is euthanized no matter your intentions.

Of course, but you will not face manslaughter charges in that case.

So, following the same logic, if you train your copilot NOT to infringe on other people's copyright and it still does, it should be destroyed no matter your intentions. But at least you won't be charged with copyright violation yourself.

That said, I don't believe Microsoft's actions to be benign. I think this copyright whitewashing scheme is fully in line with their old MO, purposefully creating a legal quagmire surrounding all open source code.


Personally, I think you are right distrusting MS ( as we should be we any corporation really ). I will admit that this attempt is working in a sense that it is a lot less clear to a non-computer person as to:

- whether there are any damages - what the big deal is

In my mind, the entire thread identified a lot of those, but I think someone already said that it will likely be tested in court ( and I have zero idea, which way it will turn ).

For the record, I personally think Copilot is a cool tool ( frankly, it is not that different from automated stack exchange in terms of results ). If I worry about anything, it is that the overall standards will decline even further.


Tim Davis doesn't actually have any instance of copyright infringement to complain about; he was able to induce Copilot to /mostly/ recreate his code through careful prompting, but no one has actually deployed the code. By the same token, we don't outlaw ctrl-c and ctrl-v buttons on computers.

There is plenty of space here to discuss developing tools to check for unintentional infringement. I would guess, though, that such tools would sweep up a whoooole lot of non-copilot human usage and make it much harder to deploy anything new.

So, maybe a better discussion to have here is how to make the animal safer, not the total outlawing of the animal. Single-line completions (the majority of co-pilot usage) aren't infringing. Probably true for almost-any few line completion. So, capping the amount of consecutive auto-completed code might be a reasonable 'muzzle' on the model to keep it reasonably safe.


I think we have an ideological disagreement here. I'm not part of the "open source" movement, I believe in free software. Although I'm not prolific, I have authored some free software and shared it widely. I want people to have it, use it, and share it, so long as they extend the same rights to their users.

Now my software has been assimilated into a proprietary blob. Had that blob been free, like my software within it, I would have accepted it, but it's not. It's controlled exclusively by Microsoft and OpenAI, two entities which I place no trust in.

For me the dog has already bitten. The free software I extended to an audience I believe would show the same generosity has instead been made into a proprietary product.

The "copyright" question for me is not a question of "fairness" or ability of Microsoft or anyone else to make a product. For me it's a tool to protect my contribution from proprietary business.

Basically. I dont want the animal safer, I want it free (according to the FSF freedoms).


> The free software I extended to an audience I believe would show the same generosity has instead been made into a proprietary product.

That's exactly my complaint about Copilot. And since all code hosted on GitHub is now subject to this land-grab, my only recourse is not to use GitHub any more if I want to publish a project of mine.


Code ingestion is not limited to github or copilot. Your best recourse is to make your code publicly inaccessible.


I am not sure your software has been assimilated into a proprietary blob. Rather, it's been sliced into its constituent parts and those parts have been tagged by the proprietary blob.

Your code isn't being run by Copilot, as such; it's been categorized in a way that allows partial retrieval without the license or attribution. This might seem like a distinction without a difference, but it's kind for a ship-of-theseus problem; probably nobody is running any of your programs in their entirety, but it's very possible that bits of your code have found your way into other people's programs. How do you distinguish between contributions that are uniquely yours, and those which are just helper functions or cobbled together from other example code, eg in documentation or from a book or Q&A website?


I am not a lawyer, but I don't think anyone needs to deploy the code in order to infringe copyright: they just need to distribute the code to a third party (hence copyright -- the right to copy). And on the face of it, Microsoft would appear to have distributed Tim Davis's code, in compressed form, as part of the trained language model in Copilot.


But in this case copilot is not equivalent to copy-paste. When doing copy-paste, you are acting with knowledge of the source of the copied code and with intent to copy code.

With copilot, you are not acting with knowledge of the source and not with intent to copy, in fact I'm sure the users would have a reasonable expectation of the tool not copy-pasting existing code verbatim.

IANAL, but I'm pretty sure that intent matters a lot.

Popcorn time was also just a tool to allow you to stream data from torrents. That didn't seem to help them put up a legal defence (nor should it have, because the intent was pretty clear on that one).

And seriously, if cases exist, where the only thing a tool does (albeit via a VERY complex implementation path) is to strip a license from a piece of code and serve that code up via an API, then that really does sound like the creators of the tool are at fault.


If you build a system that has a high likelihood of breaking the law in normal expected use, and then it's found to break the law, shouldn't we disincentivize that in some way? Is that just blaming the researchers/developers, or is that just making people respect the law?

I think the important thing to note in both dog attack scenarios presented is that the owner is responsible in both cases. Either they purposefully created an unsafe situation or they were negligent in protecting the public from their property. Whether the dog is euthanized is about preventing it from happening again. Preventing it from happening in the first place is done by making the owner liable to disincentivize it.


I'd argue that the law was already broken when my free and viral software was included in a non-free package.

Personally, i don't care about the end users. If you want to read my source i welcome that. I just want the CoPilot model and system open, since it was based (in part) on my work. Otherwise they are free to remove my work.


And you are free to sue them.

What was your plan when someone eventually infringed on your work?

If you want people to abide by your license, you have to enforce it yourself.


This article is about pursuing a class action lawsuit…


"Not knowing" doesn't free you from responsibility.

If you took a bunch of copyrighted and non-copyrighted books, cut them into pieces, shuffled them all together, then picked a passage at random from a hat; "not knowing" what you are going to get doesn't mean you aren't violating copyright.

That's essentially what copilot is doing: it's taking a bunch of code - some of it copyrighted without license - and using it as a dataset. The ML algorithm then tries to pattern match against that data to provide the user with something they want. That's just copyright violation lottery with extra steps.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: