OpenAI must turn over 20 million ChatGPT conversations to New York Times

Court rejects OpenAI stay request on November 13, requiring disclosure of 20 million anonymized chat logs by November 14 in copyright lawsuit discovery.

OpenAI must turn over 20 million ChatGPT conversations to New York Times

OpenAI must proceed with turning over 20 million anonymized ChatGPT conversation logs to the New York Times after a federal judge denied the company's request to pause the production order on November 13, 2025. US Magistrate Judge Ona Wang rejected OpenAI's motion to stay her November 7 order while the company pursues reconsideration, though she left open the possibility for OpenAI to renew the request. The ruling establishes a significant precedent for how courts will handle user privacy protections against litigation discovery demands in artificial intelligence copyright cases.

Wang's November 7 order directed OpenAI to produce the 20 million de-identified consumer ChatGPT logs to news plaintiffs by November 14, or within seven days of completing the de-identification process. The November 13 decision denying the stay means OpenAI faces immediate compliance obligations unless it successfully obtains emergency relief from a higher court. OpenAI requested additional time to file a reply brief on its reconsideration motion, which Wang granted with a November 19 deadline for submissions not to exceed two pages.

The chat logs represent a random sampling of complete user conversations from December 2022 to November 2024. They do not include conversations from ChatGPT Enterprise, ChatGPT Edu, ChatGPT Business customers, or API customers, according to OpenAI's disclosures. Each log contains a complete exchange of multiple prompt-output pairs between a user and ChatGPT, potentially exposing up to 80 million individual interactions across the 20 million sampled conversations.

Wang ruled on November 7 that OpenAI failed to explain how consumer privacy rights were not adequately protected by the existing protective order in the multidistrict litigation and OpenAI's de-identification procedures. The judge found that whether or not the parties had reached agreement to produce the 20 million consumer ChatGPT logs in whole remained vehemently disputed, but such production was appropriate regardless. OpenAI implemented what it characterized as exhaustive de-identification protocols to remove personal identifying information from the sampled conversations before any potential disclosure.

The New York Times and other news outlets sued OpenAI for copyright infringement, alleging the company misused their articles to train ChatGPT to respond to user prompts. The plaintiffs' October 30 filing accused OpenAI of defying prior agreements by refusing to produce even a small sample of billions of model outputs that its conduct has put in issue in the case. According to the news plaintiffs, immediate production of the output log sample is essential to stay on track for the February 26, 2026 discovery deadline.

The plaintiffs argued they cannot reasonably conduct expert analyses about how OpenAI's models function in its core consumer-facing product without access to the model outputs themselves. They seek to determine whether ChatGPT reproduced their copyrighted content and to rebut OpenAI's assertion that they "hacked" the chatbot's responses to manufacture evidence. Their motion emphasized that OpenAI's proposal to run searches on the logs on plaintiffs' behalf is as inefficient as it is inadequate to allow fair analysis of how real-world users interact with a core product at the center of this litigation.

OpenAI filed its motion for reconsideration on November 12, arguing that turning over the logs would disclose confidential user information. The company maintained that more than 99.99% of the transcripts have nothing to do with the copyright infringement allegations in the case. "To be clear: anyone in the world who has used ChatGPT in the past three years must now face the possibility that their personal conversations will be handed over to The Times to sift through at will in a speculative fishing expedition," OpenAI stated in its motion.

The artificial intelligence company presented several privacy-preserving alternatives to the New York Times before the court issued its production order. These included targeted searches over the sample to identify chats that might include text from New York Times articles, ensuring plaintiffs only receive conversations relevant to their claims. OpenAI also offered high-level data classifying how ChatGPT was used in the sample. The New York Times rejected these proposals, according to OpenAI's statements.

Wang cited Concord Music Group v. Anthropic as precedent for ordering wholesale production of the sampled logs. In that California case, US District Magistrate Judge Susan van Keulen directed production of a 5 million-record sample to plaintiffs. OpenAI challenged this comparison, noting it was never given an opportunity to explain why Concord should not apply because the news plaintiffs did not reference the case in their motion. According to OpenAI, the Concord order addressed the mechanism for an already agreed-upon production rather than whether wholesale production was appropriate when privacy concerns had been raised.

The Concord logs consisted of individual prompt-output pairs rather than complete conversations, OpenAI argued. Each log in the 20 million sample at issue in the New York Times case represents a complete exchange of multiple prompt-output pairs. This structural difference makes disclosure more likely to expose private information, in the same way that eavesdropping on an entire conversation reveals more than a five-second fragment, according to OpenAI's filing.

OpenAI Chief Information Security Officer Dane Stuckey published a statement on November 12 stating that 800 million people use ChatGPT weekly. The demand would force OpenAI to turn over tens of millions of highly personal conversations from people with no connection to what Stuckey characterized as the Times' baseless lawsuit. He stated the demand disregards long-standing privacy protections and breaks with common-sense security practices that have guided technology discovery for decades.

The content covered by the court order is currently stored separately in a secure system protected under legal hold. Only a small, audited OpenAI legal and security team can access this data as necessary to comply with legal obligations. The Times' outside counsel attorneys of record and their hired technical consultants would be able to access the conversations under the current order. OpenAI stated it would push to only allow the Times to view this data in a secured environment maintained under strict legal protocols.

Advertise on ppc land

Buy ads on PPC Land. PPC Land has standard and native ad formats via major DSPs and ad platforms like Google Ads. Via an auction CPM, you can reach industry professionals.

Learn more

A New York Times spokesperson disputed OpenAI's characterization of the privacy implications. "No ChatGPT user's privacy is at risk," the spokesperson stated. "The court ordered OpenAI to provide a sample of chats, anonymized by OpenAI itself, under a legal protective order." The spokesperson described OpenAI's public statements as purposely misleading users and omitting facts, noting that OpenAI's own terms of service permit the company to train its models on user chats and turn over chats for litigation.

The Times would be legally obligated at this time to not make any data public outside the court process, according to OpenAI's disclosures. However, OpenAI stated that if the Times continues to push to access the data in any way that will make the conversations public, it will fight to protect user privacy at every step. The Times' original request in the lawsuit was significantly broader, initially demanding 1.4 billion private ChatGPT conversations before OpenAI successfully pushed back through the legal process.

OpenAI argued that courts do not allow plaintiffs suing Google to dig through private emails of tens of millions of Gmail users irrespective of their relevance to litigation. Discovery should not work differently for generative AI tools, the company maintained. OpenAI characterized the production order as setting a dangerous precedent suggesting that anyone filing a lawsuit against an AI company can demand production of tens of millions of conversations without first narrowing for relevance.

The company has begun accelerating its security and privacy roadmap in response to litigation pressures. OpenAI's long-term plans include advanced security features designed to keep user data private, including client-side encryption for messages with ChatGPT. These features would make private conversations inaccessible to anyone else, even OpenAI itself. The company plans to build fully automated systems to detect safety issues in products, with only serious misuse and critical risks potentially escalated to a small team of human reviewers.

The case represents one of many pending lawsuits against technology companies over alleged misuse of copyrighted work to train AI systems. Ziff Davis filed a major copyright lawsuit against OpenAI in April 2025, accusing the company of unauthorized use of content from 45 properties including CNET, IGN, and Mashable. Anthropic agreed to a $1.5 billion settlement in September 2025 in what became the largest publicly reported copyright recovery in history.

The discovery dispute carries significant implications for the marketing community. Consumer trust in AI data practicesremains fragile, with 59% of consumers expressing discomfort with their data being used to train AI systems, according to research published in July 2025. The precedent established by this case will influence how companies handle sensitive user conversation data in future litigation and regulatory proceedings as AI systems become more deeply integrated into marketing workflows.

Meta announced plans to use AI chat data for ad targeting starting December 2025, demonstrating how conversational AI data is becoming increasingly valuable for commercial applications. The OpenAI case tests how courts will balance these commercial interests against user privacy expectations when litigation discovery demands collide with platform privacy commitments. The outcome will shape legal strategies for both AI companies defending copyright claims and publishers seeking to protect their intellectual property.

Timeline

Summary

Who: OpenAI must comply with a court order to turn over user conversations to the New York Times and other news outlets in copyright litigation. US Magistrate Judge Ona Wang issued the production order and denied OpenAI's stay request. OpenAI Chief Information Security Officer Dane Stuckey published statements defending the company's position on user privacy.

What: OpenAI must produce 20 million anonymized ChatGPT conversation logs representing a random sampling of complete user conversations from December 2022 to November 2024. The logs contain up to 80 million individual prompt-output pairs. Judge Wang denied OpenAI's November 12 request to stay the production order while the company pursues reconsideration, establishing November 14 as the delivery deadline unless OpenAI obtains emergency relief from a higher court.

When: Judge Wang issued the production order on November 7, 2025, with an initial November 14 deadline. OpenAI filed its motion for reconsideration and stay request on November 12. Wang denied the stay request on November 13, setting November 19 as the deadline for OpenAI's reply brief. The February 26, 2026 discovery deadline drives the plaintiffs' urgency for immediate production.

Where: The dispute is proceeding in US District Court for the Southern District of New York as part of multidistrict litigation consolidated under case number 25-md-3143. The chat logs are currently stored in a separate secure system maintained by OpenAI under legal hold protections. The Times' outside counsel and technical consultants would access the conversations under strict legal protocols.

Why: This discovery dispute matters for the marketing community because it establishes how courts will balance user privacy protections against litigation discovery rights when AI systems are challenged for copyright infringement. The outcome affects consumer trust in AI data practices, with existing research showing 59% of consumers uncomfortable with data being used to train AI systems. As platforms like Meta prepare to use AI chat data for ad targeting, this precedent will influence how companies handle sensitive conversational data across both litigation and commercial applications. The case demonstrates the legal risks AI companies face when building systems on potentially copyrighted content, with similar lawsuits resulting in major settlements exceeding $1.5 billion. Marketing professionals relying on AI tools must now consider how litigation discovery demands could expose proprietary strategies, client information, and competitive intelligence stored in conversational interfaces.