Court dismisses Raw Story's landmark DMCA case against OpenAI over training data

In a significant ruling issued on November 7, 2024, the United States District Court for the Southern District of New York dismissed a lawsuit filed by Raw Story Media and AlterNet Media against OpenAI. Judge McMahon granted OpenAI's motion to dismiss, determining that the news organizations lacked Article III standing to pursue their claims under the Digital Millennium Copyright Act (DMCA).

The case centered on allegations that OpenAI had removed copyright management information (CMI) from thousands of news articles when training its ChatGPT model. According to court documents, Raw Story Media and AlterNet Media collectively published over 400,000 articles that were allegedly scraped and included in OpenAI's training datasets WebText, WebText2, and Common Crawl.

The plaintiffs claimed that OpenAI stripped their content of author, title, and copyright information before using it to train ChatGPT. This removal of CMI, they argued, violated Section 1202(b)(i) of the DMCA. The news organizations sought both monetary damages and injunctive relief, requesting that OpenAI remove all copies of their copyrighted works from its training sets and repositories.

Judge McMahon's decision hinged on the fundamental question of Article III standing, which requires plaintiffs to demonstrate concrete injury, causation, and redressability. The court found that the mere removal of identifying information from copyrighted works, without dissemination, did not constitute a concrete injury sufficient for Article III standing.

Technical analysis of the court's decision

Understanding Article III Standing

The court emphasized that Article III standing requires concrete injury even in statutory violation cases. Judge McMahon cited the Supreme Court's decisions in Spokeo v. Robins and TransUnion LLC v. Ramirez, highlighting that plaintiffs must demonstrate standing for each form of relief sought.

DMCA section 1202 requirements

According to court documents, Section 1202(b)(i) of the DMCA incorporates a double-scienter requirement:

Knowledge that CMI has been removed without authority
Knowledge or reasonable grounds to know that such distribution will enable copyright infringement

Analysis of damages claim

The court rejected the plaintiffs' argument that unauthorized CMI removal alone constitutes concrete injury. Judge McMahon distinguished between Section 1202's purpose of protecting CMI integrity and other Copyright Act provisions that guard against unauthorized reproduction.

Injunctive relief standards

For the injunctive relief claim, the court analyzed whether there was a "substantial risk" of future harm. Despite plaintiffs' concerns about ChatGPT potentially reproducing their work without CMI, the court found the risk too remote given the vast amount of training data involved.

Historical context

Evolution of copyright protection

The DMCA emerged from international obligations under the World Intellectual Property Organization treaties. According to court documents, its purpose was to "ensure the integrity of the electronic marketplace by preventing fraud and misinformation."

AI training data controversy

The case highlights ongoing tensions between AI companies and content creators. As noted in the complaint, OpenAI has acknowledged that using copyright-protected works for AI training may require licensing agreements, and has entered into such arrangements with some large copyright owners.

Legal framework development

The decision illustrates courts' evolving approach to AI-related copyright claims. Judge McMahon noted that while this specific DMCA claim failed, other legal theories might address the underlying concern about uncompensated use of copyrighted material for AI training.

The case's significance

The dismissal carries significant implications for both AI companies and content creators. The court's decision suggests that simply having content included in AI training datasets, without specific instances of harmful use, may not be sufficient for legal standing.

Judge McMahon left open the possibility for amended pleadings, particularly regarding injunctive relief claims. However, the court expressed skepticism about plaintiffs' ability to allege cognizable injury under the current legal framework.

The ruling also highlights a broader issue identified by the court: the real concern may not be CMI removal but rather the fundamental question of compensation for content used in AI training. As Judge McMahon noted, this raises questions about whether other legal theories might better address these emerging technological challenges.

Key Facts

Case Number: 24 Civ. 01514
Decision Date: November 7, 2024
Court: United States District Court, Southern District of New York
Judge: McMahon, J.
Plaintiffs: Raw Story Media, Inc. and AlterNet Media, Inc.
Defendants: OpenAI entities (OpenAI, Inc., OpenAI GP, LLC, OpenAI, LLC, OpenAI Opco LLC, OpenAI Global LLC, and OpenAI Holdings, LLC)
Key Issue: Violation of DMCA Section 1202(b)(i)
Outcome: Motion to dismiss granted
Possibility of Amendment: Denied without prejudice to renewal with proper documentation