Anthropic appeals largest copyright class action certification
Copyright class could affect up to 7 million book claimants in lawsuit threatening hundreds of billions in damages.

On July 31, 2025, Anthropic PBC filed a petition seeking permission to appeal what industry groups describe as the largest copyright class action ever certified. The artificial intelligence company faces potential damages reaching hundreds of billions of dollars after a federal judge certified a class that could include up to 7 million claimants whose works span over a century of publishing history.
The class action certification stems from a lawsuit filed by authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson in August 2024. These authors alleged that Anthropic infringed their federal copyrights by downloading and using their books without authorization to train the large language models powering its Claude AI service.
According to court documents, Anthropic downloaded over 7 million pirated copies of books from sources including Books3, Library Genesis, and Pirate Library Mirror between January 2021 and July 2022. The company stored these copies in what it described as a "central library" and used various subsets to train different versions of its AI models.
Subscribe the PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.
Split decision creates complex legal landscape
Senior U.S. District Judge William Alsup delivered a mixed ruling on June 23, 2025, that distinguished between different uses of copyrighted materials. The court found that using books to train large language models constituted "exceedingly transformative" fair use under Section 107 of the Copyright Act. The judge characterized the training process as "quintessentially transformative" and noted that no infringing outputs reached users through the Claude service.
However, Alsup rejected Anthropic's fair use defense for pirated library copies. The court determined that downloading millions of books from pirate sites to build a permanent research library constituted a separate use that was "not a transformative one." The judge noted that Anthropic retained pirated copies even after deciding they would not be used for training AI models.
The ruling also approved Anthropic's practice of purchasing print books and converting them to digital format for storage purposes. The court found this format change transformative because it saved storage space and enabled searchability without creating additional copies or redistributing content.
Class certification raises unprecedented concerns
On July 17, 2025, Judge Alsup certified a class covering "all beneficial or legal copyright owners of the exclusive right to reproduce copies of any book in the versions of LibGen or PiLiMi downloaded by Anthropic." The class definition includes works possessing ISBN or ASIN identifiers that were registered with the United States Copyright Office within specific timeframes.
The certification decision has drawn criticism from multiple stakeholders, including industry groups representing technology companies and advocates for authors and digital rights. According to court filings, the Consumer Technology Association and Computer and Communications Industry Association warned that the certification threatens "immense harm not only to a single AI company, but to the entire fledgling AI industry and to America's global technological competitiveness."
Authors Alliance, Electronic Frontier Foundation, American Library Association, and other advocacy groups also filed briefs opposing the certification. These organizations argued that the district court made "almost no meaningful inquiry into who the actual members are likely to be" and failed to consider the complexity of determining ownership across millions of books.
Financial implications reach industry-wide scope
The potential damages in the case are staggering. With statutory damages ranging from $200 for innocent infringement to $150,000 for willful infringement per work, and the possibility of up to 7 million claimants, the total liability could reach hundreds of billions of dollars. According to Anthropic's financial declaration, the company has raised $15 billion to date but expects to operate at a loss in 2025 while generating no more than $5 billion in revenue.
Chief Financial Officer Krishna Rao stated in court documents that Anthropic appears "very unlikely" to obtain an appeal bond for the total possible amount of statutory damages should the company be found liable. This financial constraint creates what the company describes as coercive settlement pressure that could prevent proper adjudication of important fair use questions.
The case's outcome could influence similar litigation across the industry. Other AI companies including OpenAI, Meta, Google, Microsoft, and NVIDIA face related copyright lawsuits. The Anthropic case has proceeded faster than these other matters, making it potentially precedent-setting for the broader industry.
Court proceedings face accelerated timeline
Judge Alsup has compressed the litigation schedule significantly, moving the case toward a December 1, 2025 trial date that is nearly a year ahead of the plaintiffs' initial proposal. The court also departed from its typical practice of prohibiting settlement discussions before class certification, citing the judge's intention to take inactive status before the end of 2025.
The expedited timeline has created additional pressure points in the case. Anthropic must produce detailed information about downloaded works by August 1, 2025, and plaintiffs must submit a comprehensive list of claimed works by September 1, 2025. This list will undergo what the court described as "Daubert and defendant's exacting scrutiny" before final approval.
The court's novel notice scheme requires class members making claims to notify potential competing rightsholders about their submissions. This approach has drawn criticism for placing the burden of identifying class members on competing claimants after trial, potentially compromising due process protections.
Technical challenges in copyright determination
The case highlights significant technical and legal challenges in identifying copyrighted works within AI training datasets. Anthropic used various metadata sources and hashing methods to track downloaded books, but court records show these systems contained errors and inconsistencies.
The Books3 dataset proved particularly problematic, containing 196,640 files with limited metadata often consisting only of filename information. Judge Alsup denied class certification for Books3-related claims, noting that identification of titles and authors "would be too problematic" due to incomplete metadata and content issues.
For the LibGen and PiLiMi datasets, Anthropic maintained more comprehensive bibliographic metadata including ISBN numbers, titles, authors, and hash values. However, even these datasets contained what experts estimate as roughly one percent error rates in their cataloging systems.
Broader implications for AI development
The Anthropic litigation reflects broader tensions within the AI industry regarding content licensing and fair use. Companies have adopted divergent strategies, with some pursuing formal licensing agreements while others rely on fair use doctrine to justify training on copyrighted materials.
Google established partnerships with news organizations like The Associated Press, while OpenAI faces lawsuits from publishers including Ziff Davis and The New York Times. Meta recently won a summary judgment motion in a similar case, though that ruling applied only to specific plaintiffs.
The Consumer Technology Association warned that allowing copyright class actions in AI training cases could result in a future where "copyright questions remain unresolved and the risk of 'emboldened' claimants forcing enormous settlements will chill investments in AI." Industry groups argued that such potential liability "exerts incredibly coercive settlement pressure" that could stifle innovation.
Legislative and regulatory developments
The case proceeds amid increasing legislative attention to AI and copyright issues. Senator Peter Welch introduced the TRAIN Act in July 2025, which would establish administrative subpoena mechanisms allowing copyright owners to identify their works used in AI training datasets.
The U.S. Copyright Office released comprehensive guidance in May 2025 addressing generative AI training on copyrighted works. The office concluded that transformativeness and market effects would be the most significant factors in fair use analysis, while encouraging licensing arrangements for high-value creative content.
Academic research has also contributed to the debate, with Professor Carys Craig warning against expanding copyright restrictions on AI training data. Craig argued that such measures could create cost-prohibitive barriers favoring powerful market players while limiting beneficial AI development in healthcare and scientific research.
Appeal prospects and industry concerns
Anthropic's petition for permission to appeal raises three key questions: whether the district court manifestly erred in certifying a copyright class with millions of putative members presenting individualized issues; whether the court improperly resolved fair use questions at the class certification stage; and whether the potential damages create a "death knell" scenario justifying immediate appellate review.
The Ninth Circuit Court of Appeals has discretion to grant immediate review under Rule 23(f) based on any consideration. Courts typically find review most appropriate when decisions present unsettled fundamental legal issues, contain manifest errors, or create death knell scenarios for defendants.
Industry observers note that one district court's rulings should not determine the fate of transformational AI technology or heavily influence the entire generative AI industry. The appeal represents a critical juncture for establishing legal frameworks governing AI training on copyrighted materials.
The case has attracted significant attention from advocacy groups representing diverse interests. Authors Alliance and related organizations expressed concern that the class certification could harm authors who might prefer individual litigation or different legal strategies. Technology industry groups warned that the precedent could devastate AI development and America's global technological competitiveness.
Subscribe the PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.
Timeline
- January-February 2021: Anthropic co-founders begin downloading pirated books from Books3, containing 196,640 unauthorized copies
- June 2021: Ben Mann downloads at least 5 million pirated books from Library Genesis using BitTorrent protocols
- July 2022: Anthropic downloads at least 2 million books from Pirate Library Mirror to avoid duplicating LibGen content
- March 2023: Claude AI service launches publicly, first of seven successive versions released to date
- February 2024: Anthropic hires Tom Turvey to obtain "all the books in the world" while avoiding legal complications
- Spring 2024: Anthropic begins bulk-purchasing print books for destructive scanning and digital conversion
- August 2024: Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson file putative class action lawsuit
- October 2024: Court requires class certification motions by March 6, 2025, setting accelerated litigation schedule
- January 2025: Concord Music and other publishers reach partial agreement with Anthropic on copyright protection measures
- February 2025: Court grants Anthropic's motion for early summary judgment on fair use before class certification
- April 2025: Anthropic claws back spreadsheet showing data mix compositions used for training various LLMs
- May 2025: U.S. Copyright Office releases comprehensive AI training guidance addressing fair use questions
- June 2025: Reddit sues Anthropic over unauthorized AI training on platform data
- June 23, 2025: Judge William Alsup issues mixed ruling on fair use and piracy claims in landmark split decision
- June 25, 2025: Judge Chhabria grants Meta summary judgment in contrasting ruling on similar issues
- July 17, 2025: District court certifies class covering up to 7 million potential claimants in unprecedented decision
- July 31, 2025: Anthropic files petition for permission to appeal class certification under Rule 23(f)
- December 2025: Trial scheduled on pirated copies and resulting damages assessment
Subscribe the PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.
PPC Land explains
Fair use: A legal doctrine under Section 107 of the Copyright Act that permits limited use of copyrighted material without permission for purposes such as criticism, comment, news reporting, teaching, scholarship, or research. In the Anthropic case, the court found that training AI models constitutes "exceedingly transformative" fair use because it creates something fundamentally different from the original works. However, fair use analysis requires examining four factors: the purpose and character of use, the nature of the copyrighted work, the amount used, and the effect on the market for the original work.
Class action: A type of lawsuit where one or more plaintiffs represent a larger group of people with similar claims against the same defendant. The Anthropic class certification covers up to 7 million potential claimants whose copyrighted books were allegedly downloaded without authorization. Class actions allow efficient resolution of mass claims but require courts to ensure that common questions predominate over individual issues and that class representatives adequately protect absent members' interests.
Statutory damages: Monetary penalties established by copyright law that allow courts to award damages within specified ranges without requiring proof of actual financial losses. Under the Copyright Act, statutory damages range from $200 for innocent infringement to $150,000 per work for willful infringement. With potentially 7 million works at stake, Anthropic faces exposure to hundreds of billions of dollars in statutory damages, creating what the company describes as coercive settlement pressure.
Large language models (LLMs): Artificial intelligence systems trained on vast amounts of text data to understand and generate human-like language. Anthropic's Claude service is powered by LLMs that were trained using books, websites, and other textual materials. The training process involves analyzing statistical relationships between words and phrases across trillions of tokens, enabling the models to produce coherent responses to user prompts while incorporating learned patterns from the training data.
Copyright infringement: The unauthorized use of copyrighted material in ways that violate the exclusive rights granted to copyright owners. These rights include reproduction, distribution, display, performance, and creation of derivative works. In the Anthropic case, authors claim the company infringed their copyrights by downloading and using their books without permission to train AI models, though the court distinguished between legitimate training uses and unauthorized library building.
Transformative use: A key factor in fair use analysis that examines whether the new work adds something new with a further purpose or different character from the original. Transformative uses are more likely to qualify for fair use protection because they advance the constitutional purpose of copyright law to promote creativity and knowledge. The court found AI training highly transformative because it creates new capabilities for generating text rather than simply reproducing existing works.
Pirated content: Copyrighted material that has been copied and distributed without authorization from rights holders. Anthropic downloaded millions of books from pirate libraries including Books3, Library Genesis, and Pirate Library Mirror. While the court approved fair use for training purposes, it rejected fair use defenses for building permanent libraries using pirated content, finding this constituted a separate non-transformative use that harmed copyright owners' market interests.
LibGen and PiLiMi: Library Genesis and Pirate Library Mirror are online repositories containing millions of pirated books and academic papers. Anthropic downloaded approximately 5 million books from LibGen and 2 million from PiLiMi using BitTorrent protocols. These libraries provided comprehensive metadata including ISBN numbers, titles, authors, and hash values that enabled systematic downloading and cataloging of copyrighted works for AI training purposes.
Class certification: The legal process by which courts determine whether a lawsuit may proceed as a class action on behalf of a large group of similarly situated plaintiffs. Courts must find that the case meets requirements including numerosity, commonality, typicality, and adequacy of representation. The Anthropic certification has been criticized for failing to conduct rigorous analysis of individual ownership issues and for creating an unprecedented notification scheme that may compromise due process protections.
Training data: The collection of text, images, audio, or other materials used to teach artificial intelligence models how to perform specific tasks. For language models, training data typically consists of books, articles, websites, and other textual content that help models learn patterns in human language. The quality and diversity of training data significantly affects model performance, leading AI companies to seek comprehensive datasets that may include copyrighted materials without explicit permission from rights holders.
Subscribe the PPC Land newsletter ✉️ for similar stories like this one. Receive the news every day in your inbox. Free of ads. 10 USD per year.
Summary
Who: Senior U.S. District Judge William Alsup ruled in a copyright case between authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against AI company Anthropic PBC. The case involves up to 7 million potential class members whose copyrighted books were allegedly used without authorization.
What: The court delivered a split decision finding that using copyrighted books to train AI models constitutes fair use, while allowing claims over pirated content to proceed to trial for damages assessment. The subsequent class certification creates the largest copyright class action in history with potential damages reaching hundreds of billions of dollars.
When: The litigation began in August 2024, with the fair use ruling issued June 23, 2025, class certification on July 17, 2025, and Anthropic's appeal petition filed July 31, 2025. Trial is scheduled for December 2025, with the underlying conduct spanning from January 2021 through ongoing operations.
Where: The case was decided in the United States District Court for the Northern District of California, with Anthropic's appeal pending before the Ninth Circuit Court of Appeals. The implications extend nationwide and potentially globally given the precedential nature of AI copyright questions.
Why: The decision addresses fundamental questions about copyright protection in the age of artificial intelligence, balancing innovation incentives against creator rights while establishing precedent for AI training practices. The case matters for the marketing community because it could reshape how AI companies acquire and use content, potentially affecting advertising technology, content generation tools, and the broader digital marketing ecosystem that increasingly relies on AI-powered solutions.