Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the court order doesn’t quite go against as many norms as OpenAI is claiming. It’s very reasonable to retain data pertinent to a case, and NYT’s case almost certainly revolves around finding out copyright infringement damages, which are calculated based on the number of violations (how many users queried ChatGPT and were returned verbatim copyrighted material from NYT).

If you don’t retain that data you’re destroying evidence for the case.

It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem. I saw NYT’s filing and it had very compelling evidence that you could get ChatGPT to distribute verbatim copyrighted text from the Times without citation.



It absolutely goes against norms in many countries other than the US, and the data of residents/citizens of these countries are affected too.

> It’s not like the data is going to be given to anyone, it’s only gong to be used for limited legal purposes for the lawsuit (as OpenAI confirms in this article).

Nobody other than both parties to the case, their lawyers, the court, and whatever case file storage system they use. In my view, that's already way too much given the amount and value of this data.


Countries other than the US aren't part of this lawsuit. ChatGPT operates in the US under US law. I don't know if they have separated data storage for other countries.

I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

You're saying it's unreasonable to store data somewhere for a pending court case? Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information. That's ridiculous, if that was true then it would be impossible to perform discovery and get anything done in court.


> I don't believe you would be considered to be violating the GDPR if you are complying with another court order, because you are presumably making a best effort to comply with the GDPR besides that court order.

It most likely depends on the exact circumstances. I could absolutely imagine a European court deciding that, sorry, but if you have to answer to a court decision incompatible with European privacy laws, you can't offer services to European residents anymore.

> You're saying it's unreasonable to store data somewhere for a pending court case?

I'm saying it can be, depending on how much personal and/or unrelated data gets tangled up in it. That seems to be the case here.

> Conceptually you're saying that you can't preserve data for trials because the filing cabinets might see the information.

I'm only saying that there should be proportionality. A court having access to all facts relevant to a case is important, but it's not the only important thing in the world.

Otherwise, we could easily end up with a Dirk-Gently-esque court that, based on the principle that everything is connected to everything, will just demand access to all the data in the world.


The scope of the data access required by the court is being worked out via due process. That’s why there’s an appeal system. OpenAI is just grandstanding in a public forum so that their customers don’t defect.

When it comes to GDPR, courts have generally taken the stance that GDPR is not overruling.

Ironburg Inventions, Ltd. v. Valve Corp.

Finjan, Inc. v. Zscaler, Inc.

Corel Software, LLC v. Microsoft

Rollins Ranches, LLC v. Watson

In none of these cases was a GDPR fine issued.


Putting the merits of this specific case and positive vs. negative sentiments toward OpenAI aside, this tactic seems like it can be used to destroy any business or organization with customers who place a high value on privacy—without actually going through due process and winning a lawsuit.

Imagine a lawsuit against Signal that claimed some nefarious activity, harmful to the plaintiff, was occurring broadly in chats. The plaintiff can claim, like NYT, that it might be necessary to examine private chats in the future to make a determination about some aspect of the lawsuit, and the judge can then order Signal to find a way to retain all chats for potential review.

However you feel about OpenAI, this is not a good precedent for user privacy and security.


I'm confused at how you think that NYT isn't going through due process and attempting to win a lawsuit.

The court isn't saying "preserve this data forever and ever and compromise everyone's privacy," they're saying "preserve this data for the purposes of this court while we perform an investigation."

IMO, the NYT has a very good argument here that the only way to determine the scope of the copyright infringement is to analyze requests and responses made by every single customer. Like I said in my original comment, the remedies for copyright infringement are on a per-infringement basis. E.g., everytime someone on LimeWire downloads Song 2 by Blur from your PC, you've committed one instance of copyright infringement. My interpretation is that NYT wants the court to find out how many times customers have received ChatGPT responses that include verbatim New York Times content.


I don't think you're addressing my argument. If the "due process" destroys customer trust in the business being sued, regardless of the verdict, that's not really due process.


That's not entirely fair. The argument isn't "users are using the service to break the law" but rather "the service is facilitating law breaking". To fix your signal analogy suppose you could use the chat interface to request copyrighted material from the operator.


That doesn't change the outcome being the same in that the app has to send the plain text messages of everyone, including the chat history of every user.


Right. But requiring logs due to suspicion that the service itself is actively violating the law is entirely different from doing so on the basis that end users might be up to no good entirely independently.

Also OpenAI was never E2EE to begin with. They were already retaining logs for some period of time.

My personal view is that the court order is overly broad and disregards potential impacts on end users but it's nonetheless important to be accurate about what is and isn't happening here.


Again keep in mind that we are talking about a case limited analysis of that data within the privacy of the court system.

For example, if the trial happens to find data that some chats include crimes committed by users in their private chats, the court can't just send police to your door based on that information since the information is only being used in the context of an intellectual property lawsuit.

Remember that privacy rights are legitimate rights but they change a lot when you're in the context of an investigation/court proceeding. E.g., the right of police to enter and search your home changes a lot when they get a court issued warrant.

The whole point of E2EE services from the perspective of privacy-concious customers is that a court can get a warrant for data from those companies but they'll only be able to produce encrypted blobs with no access to decryption keys. OpenAI was always a not-E2EE service, so customers have to expect that a court order could surface their data to someone else's eyes at some point.


And honestly, OpenAI should have just not used copyrighted data illegally and they would have never had this problem

The whole premise of the lawsuit is that they didn't do anything unlawful, so saying "just do what the NYT wanted you to do" isn't interesting.


No, you're misinterpreting how information discovery and the court system works.

The NYT made an argument to a judge about what they think is going on and how they think the copyright infringement is taking place and harming them. In their filings and hearings they present the reasoning and evidence they have that leads them to believe that a violation is occurring. The court makes a judgment on whether or not to order OpenAI to preserve and disclose information relevant to the case to the court.

It's not "just do what NYT wanted you to do," it's "do what the court orders you to do based on a lawsuit brought by a plaintiff and argued to the court."

I suggest you read the court filing: https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: