One of the reasons that comes to my mind is - it could have been problematic look for only Microsoft (Copilot) to have access to GitHub for training AI models - à la monopolizing a data treasure trove. With anti-competitive legislation catching up to Google to open up its Play Store, this could have been one of key reasons why this deal came about.
Copilot can choke on my AGPL code on GitHub, that was used for training their proprietary models. I'm still salty about this, sadly looks like the world has largely moved on.
It really feels like a digital form of colonialism; they come in take everything, completely disregard the rules, ignore intellectual copyright laws (while you still have to obey them), but when you speak out against this suddenly you are a luddite that doesn't care about human progress.
> Silicon Knights had "deliberately and repeatedly copied thousands of lines of Epic Games' copyrighted code, and then attempted to conceal its wrongdoing by removing Epic Games' copyright notices and by disguising Epic Games' copyrighted code as Silicon Knights' own
> Epic Games prevailed against Silicon Knights' lawsuit, and won its counter-suit for $4.45 million on grounds of copyright infringement,
> following the loss of the court case, Silicon Knights filed for bankruptcy
The Claude terms of service [1] apparently preclude Anthropic or AWS using GitHub user data for training:
GitHub Copilot uses Claude 3.5 Sonnet hosted on Amazon Web Services. When using Claude 3.5 Sonnet, prompts and metadata are sent to Amazon's Bedrock service, which makes the following data commitments: Amazon Bedrock doesn't store or log your prompts and completions. Amazon Bedrock doesn't use your prompts and completions to train any AWS models and doesn't distribute them to third parties.