Change of venue and jury sequestration are controlled by the judge, not the prosecutor, and thus definitionally cannot be prosecutorial misconduct. And it's not misconduct for a prosecutor to disparage the defense in front of the jury unless it's extreme or falls into specific categories. Generally the prosecutor's entire job is to make the jury not believe the defense's story.
There's a decent bit of caselaw indicating that computers reading and using a copyrighted work simply "don't count" in terms of copyright infringement -- only humans can infringe copyright. This article[0] does a pretty good job of summarizing the rationale that the courts have provided. My (non-lawyer) take is that GitHub is pushing this just half a step farther -- if computers can consume copyrighted material, and use it to answer questions like "was this essay plagiarized", then in GitHub's view they can also use it to train an AI model (even if it occasionally spits back out snippets of the copyrighted training data). Microsoft has enough lawyers on staff that I'm sure they have analyzed this in depth and believe they at least have a defensible position.
Makes me wonder what would happen if a similar thing was done with books. If I train an AI on all the texts of Tom Clancy, or Stephen King, or every Star Wars novel, and the books it generates every so often produce paragraphs verbatim from one of those sources, would copyright owners be up in arms? What would the distinction be between the code case and the text case?
I am not a lawyer. I do photography and have a more than passing interest in copyright as it applies to the photographs I take and the material I photograph.
Rather than text, my AI copyright hypothetical... consider a model created based on sunset photographs. You take a regular photograph, pass it through the model, and it transforms it into a sunset. The model was trained on copyrighted works but the model is considered fair use.
Now, I go and take a photograph from some location during the day and then pass it through the transformer and get a sunset. Yea me! Unbeknownst to me, that location is a favorite location for photographers and there were sunsets from that location used in the training data. My photograph, transformed to look like a sunset is now similar to one of them in the training data.
Is my transformed photograph a derivative work of the one in the training data to which it bears similarity to? How would a judge feel about it? How does the photographer who's photograph was used in the training data feel?
What would be interesting in that case would be how the transformed image would look if photos from that location were removed from the training set. That would help reveal whether it was just copying what it had seen or it actually remembered what sunsets looked like and transformed the image using its memory of sunsets in general.
This will surely happen within the next few years; but if the "new work" contains a full paragraph from an existing novel the copyright hammer would come down hard.
Maybe it needs to be paired with another network / hunk of code that checks for verbatim copying?
> There's a decent bit of caselaw indicating that computers reading and using a copyrighted work simply "don't count" in terms of copyright infringement -- only humans can infringe copyright.
I have read variations of "computers don't commit copyright" more times than I can count in the past few days.
How is Copilot different from a compiler? (Please give me the legal answer, not the technical answer. I now the difference between Copilot and a compiler, technically.)
Isn't a compiler a computer program? How is its output covered by copyright?
Am I fundamentally misunderstanding something here?
What if I made a few tweaks to Copilot so that it is very likely to reproduce large chunks of verbatim code that I would like to use without attribution, such as the Linux kernel. Do you really think you can write a computer program that magically "launders" IP?
A compiler is run on original sources. I don't see any analogy here at all.
* They both can combine different works to create a derivative work of each work. (Compilers do this with optimizations, especially inlining with link-time optimization.)
They really do the same things, and yet, we say that the output of compilers is still under the license that the source code had. Why not Copilot?
Because the sources used for input do not belong to the person operating the tool.
If you say that doesn't matter, then you are saying open source licenses don't matter because the same thing applies - I could just run a tool (compiler) on someone else's code, and ignore the terms of their license when I redistribute the binary.
If I take some code I don’t have a license for, feed it to a compiler (perhaps with some -O4 option that uses deep learning because buzzwords), then is the resulting binary covered under fair use, and therefore free of all license restrictions?
If not, then how is what Copilot is doing any different?
> If I take some code I don’t have a license for, feed it to a compiler (perhaps with some -O4 option that uses deep learning because buzzwords), then is the resulting binary covered under fair use
No, the binary is not free of license restrictions. Read any open source license - there are terms under which you can redistribute a binary made from the code. For GPL you have to make all your sources available under the same terms for example. For MIT you have to include attribution. For Apache you have to attribute and agree not to file any patents on the work in Apache licensed project you use. This has been upheld in many court cases - though it is not always easy to find litigants who can fund the cases the licenses are sound.
I think you have what I am saying backwards. I am saying that the licenses should apply to the output of Copilot, like they apply to the output of compilers.
"Computers don't commit copyright" is a complete misreading or misunderstanding of another proposition, that "computers cannot author a work".
Authoring is the act that causes a work to be copyrightable. In most jurisdictions, authoring a work automatically causes copyright to subsist in the work to some degree. The purpose of the copyright system is to encourage people to author new, original works, by rewarding those who do with exclusive rights. It is well-known that only humans can author a work. Computers simply cannot do it. If your computer (by some kind of integer overflow UB miracle) accidentally prints out a beautiful artwork, NOBODY has exclusive copyright over it, and anyone may reproduce it without limitation. Same goes for that monkey who took a selfie.
What a compiler does, on the other hand, is adapt a work. Adapting a work is not authoring it. Sometimes when you adapt a work, you also author some original work yourself, like when you translate a book into another language. When a compiler (not a linker) transforms source code, it absolutely, 100% definitely does NOT add any original work; the executable or .so/.a/.dylib/.dll file is simply an adaptation of the original work. The copyright-holder of the source code is the copyright-holder of the machine code. An adaptation is also known as a "derivative work".
(Side note; copyleft licenses boil down to some variation of "if you adapt this, you have to share everything in the derivative work, not just the bits you copied.")
Adaptation is a form of reproduction. It's copying. "Distribution" also often involves copying, at least on the internet. (Selling or giving away a book you have purchased does not constitute copying.) Copying is one of the exclusive rights you have when you own the copyright in a work, that you may then license out.
It gets more complicated when the computer uses fancy ML methods to produce images/text out of things it has seen/read. You can't simplify the law around that to a simple adage digestible enough to share memetically on HN and Twitter. One thing is certain: if the computer did it, by itself, then no original work was authored in the process. That poses a problem for people who write the name of a function and get CoPilot to write the rest; if you do that, you are not the author of that part of the program. If you use it more interactively that's a different story.
There is, however, always a question of whether the copyright in the original works the computer used still subsists in the output.
My rough framing of the licensing issues around CoPilot is therefore as follows:
1. The source code to CoPilot is an original work, and the copyright is owned by GitHub.
2. When GH trained CoPilot's models on other people's works, was that copying? (This one is partially answered. It can spit out verbatim fragments, so it must be copying to some extent, rather than e.g. actually learning how to code from first principles by reading.) If it was not all copying, how much of it was copying and how much of it was something else? What else was it?
3. If GH adapted the originals, what is the derivative work? (I.E. where does the copyright subsist now? Is is a blob of random fragments of code with some weights to a neural network?)
4. Which works is it an adaptation of? You might think "all of them, and for each one, all of the code" but I'm not so sure. For example, imagine the ML blob contains many fragments, but some are shorter than others. If your program has "int x;" in it, and CoPilot can name a variable "x", you can hardly claim that as your own. I'm most interested in whether the mere fact of CoPilot having digested ALL of it, having fed this into the mix and producing a ML blob based on all that information, means that the ML blob is a derivative work of all of them. Or whether there is some question of degree.
5. Fair use. Was it fair use to train the model? Is it, separately or not, fair use to create a commercial product from the model and sell it? Fair use cares about commercial use, nature of the copied work, amount of copying in relation to the whole, and the effect on the market for / value of the copied work. Massive question.
6. If not fair use, then GH is subject to the licenses and how they regulate use of the works. What license conditions must GH comply with when they deal with the derivative work, and how? Many will be tempted to jump straight to this question and say GH must release the source code to CoPilot. I'm not yet convinced that e.g. GPL would require this. I can't believe I'm writing this, but is the ML blob statically or dynamically linked? Lol.
7. Final question, is there some way to separate out works which were copied with no fair use (or not copied at all), from works which were copied with no fair use? People are worried about code laundering, e.g. typing the preamble to a kernel function and reproducing it in full. In that situation, it is fairly obvious that the end user has ultimately copied code from the kernel and needs to abide by GPL 2.0; moreover if they're using CoPilot to write out large swathes of text they will naturally be alert to this possibility and wary of using its output. But think of the converse: if there is no way to get CoPilot to reproduce something you wrote, what's the substance of your complaint? Is CoPilot's model really a derivative of your work, any more than me, having read your code, being better at coding now? Strategically, if you wanted to get GH to distribute the model in full, you might only need one copyleft-licensed, verbatim-reproducible work's owner to complain. But then they would just remove the complainant's code. You might be looking at forcing them to have a "do not use in CoPilot" button or something.
I think this is more cogent analysis than anything else I've seen yet on this topic. You should consider submitting a blog post so this can become a top-level topic.
Also, I loved this quote:
> Copying is one of the exclusive rights you have when you own the copyright in a work, that you may then license out.
I've been paying attention to software copyright topics for more than twenty years and never thought of it in exactly these terms. Its right there in the name - the right to copy it - and determine the terms under which others can copy it is exactly what a copyright is!
I don't doubt that an army of lawyers has poured over this but they have size on their side: the cost of litigation vs potential revenue will be a massive factor.
Edit:
> There's a decent bit of caselaw indicating that computers reading and using a copyrighted work simply "don't count" in terms of copyright infringement.
That means their computer can read any code it wants, do whatever it wants with the code, then they can monetise that by giving YOU the code. Would they then be indemnified by saying "no Microsoft human read or used this code"?
However, if you then use the code and look at it, does that make you liable?
Again, not a lawyer, just a guy who likes reading this stuff. The devil is usually in the details of copyright cases. The Turnitin case hinged substantially on whether Turnitin's use of copyrighted essays was "fair use". There are four factors[0] which determine fair use; the two more relevant factors here are "the purpose and character of your use" and "the effect of the use upon the potential market". The court found that Turnitin's use was highly "transformative" (meaning they didn't just e.g. republish essays; they transformed the copyrighted material into a black-box plagiarism detection service) and also found that Turnitin's use had minimal effect on the market (this is where "computers don't count" comes in -- computers reading copyrighted material don't affect the market much because a computer wasn't ever going to buy an essay).
I would be shocked if GitHub's lawyers didn't argue that using copyrighted material as training data for an AI model is highly transformative. There may be snippets available from the original but they are completely divorced from their original context and virtually unrecognizable unless they happen to be famous like the Quake inverse square root algorithm. And I think GitHub's lawyers would also argue that Copilot's use does not affect the _original_ market -- e.g. it does not hurt Quake's sales if their algorithm is anonymously used in a probably totally unrelated codebase.
Your counterexample would probably fail both tests -- it's not transformative use if your software hands out complete pieces of copyrighted software, and it would definitely affect the market if Copilot gave me the entire source code of Quake for my own game.
I thought I understood fair use but turns out I was wrong...
That being said, creating a transformative work from something else is considered fair use. So, for example, if I read a whole bunch of books and then, heavily influenced by them, create my own, similar book, that would be fair use I suppose... that makes sense.
But, where does the derivative works come in? Where do you draw the line?
If I am heavily influenced by billions of lines of other people's GPL code (ala Copilot!), then I create my own tool from it and keep my code hidden, does that not mean I am abusing the GPL license?
That's what I meant by the devil being in the details -- these gray area questions hinge on the specific facts. Lawyers on both sides will argue which factors apply based on past caselaw and available evidence, and the court renders a decision. For example, from the Stanford webpage I previously linked: "the creation of a Harry Potter encyclopedia was determined to be “slightly transformative” (because it made the Harry Potter terms and lexicons available in one volume), but this transformative quality was not enough to justify a fair use defense in light of the extensive verbatim use of text from the Harry Potter books". So you might be okay creating a Harry Potter encyclopedia in general, but not if your definitions are copy/pasted from the books, but you might still be okay quoting key lines from the books if the quotes are a small portion of your encyclopedia. The caselaw just doesn't lend itself to firm lines in the sand.
If you read a bunch of books and then create a similar book, that isn't transformative; transformative is like, you read a bunch of books and then create a machine translation service. The point of transformative is like "isn't going to conflict with the market or compete in any way with the original thing".
That’s funny, because the bedrock of copyright - insofar as software is concerned - is entirely predicated on the idea that a computer copying code into RAM to execute it is indeed a copyright violation outside of a license to do so.
I think the point though is that security warnings need to be actionable and high-signal. Experienced folks are absolutely tuning out the security warnings on npm install, because 95% of the warnings are like the examples in the post -- I know they don't affect me/my use case and there's nothing I can do about them anyway. The effect is only compounded for novices who run "npx create-react-app hello-world" and immediately see something incomprehensible about a vulnerability in react-scripts > webpack > watchpack > watchpack-chokidar2 > chokidar > glob-parent. It either discourages them from programming entirely or it teaches them to ignore security warnings.
I don't disagree with your overall point -- e.g. we should absolutely teach novices "here's what XSS is and how to avoid it" early and often. But if a dependency manager is going to surface a vulnerability alert every time I install dependencies, the alerts should be 1) high severity (to the point where I should actually stop using the package if I am unable to patch/upgrade) or 2) at least immediately actionable. The current npm audit implementation does the opposite -- 95% of the alerts are totally irrelevant to my actual security posture, and the suggested command to upgrade a vulnerable dependency is unreliable and can actually downgrade to an older, even-less-secure version (!).
Yeah, this I totally agree with. Actionable alerts are important. The idea that novices should be opted out of ecosystem security concerns generally, less so (not least because they'll create their own security problems in the process).
I've used the following list of packages for eslint, prettier, and TypeScript. It's not as effortless as it should be, but the two LSP packages in particular do give reference navigation that's pretty equivalent to VS Code.
If you're on macos, I also recommend creating a file at ~/Library/Application Support/Sublime Text 3/Packages/User/Default (OSX).sublime-mousemap with the following contents -- this adds a cmd+click "go to definition" shortcut that's also equivalent to what VS Code provides. (I guess the path should be "Sublime Text 4" now? but after upgrading, the config at the "Sublime Text 3" path still works for me.)
I've tried to switch to VS Code a few times -- language features (especially TypeScript) tend to work better out-of-the-box but it still isn't close in terms of performance. If you want IDE-like features to "just work" then VS Code is definitely the best choice, but the persistent (albeit sleight) input lag drives me up the wall.
AJAX (Asynchronous Javascript and XML) is purely orthogonal to DOM updates. An AJAX request is just a network request initiated asynchronously via some Javascript on the page, as opposed to a request initiated synchronously when a user clicks a link or submits a form. Response data from an AJAX request could be used to make a direct DOM mutation, update React state, log to the console, or anything else the developer wants.
Most of the issues on that dashboard don't seem related to the JS SDK, e.g. "Increased latency on marketing insights API" and "Instagram comments webhooks event delivery traffic drop". It's a blended view of every(?) Facebook developer product but you'd usually calculate 9's for each individual product.
The problem here is that the JS SDK encompasses all of their products. They don't have a different script or bundle to download for each thing. You just configure your code to point to their JS SDK url, with some params for which version you want, how you want to use it (xml, json), if it should use cookies or some other tracking method. Then it sends you what they think you need based on those params along with your app id so they can see your app configuration, all in one bundle.
This makes it very complicated to say what is actually down or unavailable, which is why I'm guessing the status is "degraded performance", not "down" and they're not calling it an "outage", because technically other parts of the SDK are completely unaffected.
To say that anything in particular is down, they'd have to list a set of API endpoints that are down or a set of very specific features. This is to their advantage though as [1] as already noted, they don't have to say that their SDK is "down", since it's technically not and [2] other people are still going to argue that "hey, it's up for me".
End result, they don't really care how anybody else feels about it. What are people going to do, move to another platform? Stop using facebook to generate traffic and thus revenue? :eye_roll:
Police absolutely do not have to read your Miranda rights in order to arrest you -- they only need to read your rights if they plan to question you with regards to a criminal investigation and they would like to use your statements as evidence in a court of law.
Breyer has actually been writing on these issues for quite some time; see for example his article "The Uneasy Case for Copyright: A Study of Copyright in Books, Photocopies, and Computer Programs" [0] -- published in the Harvard Law Review in 1970. His overall body of work demonstrates a pretty good understanding of the underlying technical issues, and he has been a reliable ally on the bench in this area.
Aha! is the #1 tool for product managers to plan strategy and roadmaps. We serve more than 300,000 users worldwide.
We are looking for:
* Experienced full-stack engineers to work on the Aha! product. Our application is built in Ruby on Rails, with some React on the frontend for rich client-side experiences.
* Security engineers, with hands-on Rails development experience plus experience with compliance projects, security policy development, or other security initiatives.
* A senior product manager with experience serving the needs of product and/or engineering teams.
Aha! is profitable, you can work from anywhere in North or South America, and we offer excellent benefits. We use our own product to manage our work (which is especially rewarding) and we deploy continuously.
Our entire team is remote - in North American timezones so we can collaborate during the work day.
You can view open engineering positions at https://www.aha.io/company/careers/current-openings, and click through to a specific job for our simple application form. Our job postings also have a lot more detail about the team, our values, and what you'd be doing day-to-day.
so... anywhere in the world?