Court rules Meta used copyrighted books legally for AI training

A federal judge ruled on June 25, 2025, Case 3:23-cv-03417-VC, that Meta's use of copyrighted books to train its Llama large language models constitutes fair use, granting the company's motion for summary judgment in a closely watched copyright infringement case.

Judge Vince Chhabria of the Northern District of California found that while Meta's copying of 666 books from thirteen authors was substantial and commercial, the transformative nature of using the works to train artificial intelligence models outweighed potential market harm concerns. The decision, however, applies only to these specific plaintiffs and does not shield Meta from future copyright claims by other authors.

The case, Kadrey v. Meta Platforms Inc., began when authors including Sarah Silverman, Junot Díaz, and Andrew Sean Greer sued Meta for allegedly downloading their books from shadow libraries including Library Genesis and Anna's Archive without permission. According to court documents, Meta obtained the books through BitTorrent protocols between October 2022 and early 2024, after attempts to negotiate licensing deals with publishers proved unsuccessful.

Get the PPC Land newsletter ✉️ for more like this.

Summary

Who: Thirteen authors including Sarah Silverman, Junot Díaz, and Andrew Sean Greer sued Meta Platforms Inc. Judge Vince Chhabria of the Northern District of California ruled on the case. Industry analyst Jason Kint of Digital Content Next provided commentary highlighting key aspects of the decision.

What: The court granted Meta's motion for summary judgment, ruling that the company's use of copyrighted books to train its Llama large language models constitutes fair use under copyright law. However, the decision applies only to these specific plaintiffs, with Kint emphasizing that other authors can still pursue similar claims.

When: The ruling was issued on June 25, 2025, after Meta downloaded the authors' books from shadow libraries between October 2022 and early 2024 for AI training purposes.

Where: The case was decided in the US District Court for the Northern District of California. Meta obtained the copyrighted books from shadow libraries including Library Genesis and Anna's Archive using BitTorrent protocols.

Why: The court found Meta's use highly transformative, explicitly rejecting AI industry arguments about equivalence between human reading and machine learning. Kint highlighted the judge's distinction between human learning and AI processing as particularly significant for future cases, while noting the limited scope of this specific ruling.

Follow PPC Land on LinkedIn

Meta initially allocated up to $100 million for book licensing, according to the ruling. However, the company discovered that publishers typically do not hold subsidiary rights for AI training purposes, which remain with individual authors. After escalation to CEO Mark Zuckerberg in spring 2023, Meta decided to proceed with using unlicensed material downloaded from shadow libraries.

gov.uscourts.cand.415175.598.0_1

gov.uscourts.cand.415175.598.0_1.pdf

342 KB

The court analyzed the case under the four-factor fair use test established in copyright law. Judge Chhabria determined that Meta's use was highly transformative, noting the fundamental difference between reading a book and using it to train an AI system. "An LLM ingests text to learn statistical patterns of how words are used together in different contexts," the judge wrote, distinguishing this from human consumption of literature.

Industry observers have been closely analyzing the decision's implications. Jason Kint, CEO of Digital Content Next, urged careful attention to the actual ruling rather than potentially misleading interpretations. According to Kint, who shared the full court opinion on social media, the decision warrants thorough examination given its potential for mischaracterization by industry advocacy groups.

Kint particularly highlighted Judge Chhabria's explicit rejection of arguments suggesting equivalence between human learning and AI training. The court's analysis directly addressed claims by AI executives, including OpenAI's Sam Altman, who have attempted to establish a "right to learn" for large language models. According to the ruling, this distinction between human reading and machine processing carries significant legal weight in fair use determinations.

The transformative nature proved decisive despite Meta's commercial motivations. The company projects generating $2-3 billion in revenue from generative AI in 2025, with potential earnings of $460 billion to $1.4 trillion over the next decade. However, the court found that commercial use alone did not preclude fair use protection when the copying serves a sufficiently transformative purpose.

Central to the decision was the court's finding that Meta's Llama models cannot meaningfully reproduce the plaintiffs' works. Even when subjected to "adversarial prompting" designed to make the system regurgitate training data, Llama produced no more than 50 words from any plaintiff's book. The plaintiffs' own expert testified that Llama could not reproduce "any significant percentage" of their works.

The authors' arguments focused primarily on two theories of market harm: that Llama could reproduce portions of their books, and that Meta's unauthorized use damaged the emerging market for AI training licenses. The court rejected both arguments, finding insufficient evidence of actual market substitution and ruling that authors cannot claim exclusive rights to license their works for transformative uses.

More significantly, the court addressed but ultimately dismissed a third theory of harm that could prove crucial in future AI copyright cases. This "market dilution" theory suggests that AI systems trained on copyrighted works could flood markets with similar competing content, indirectly harming original authors. While acknowledging this concern as potentially "far more promising" than other arguments, Judge Chhabria found the plaintiffs presented insufficient evidence to support their claims.

"No other use—whether it's the creation of a single secondary work or the creation of other digital tools—has anything near the potential to flood the market with competing works the way that LLM training does," the judge wrote. However, he noted that the plaintiffs "barely give this issue lip service" and presented no evidence about how Llama's outputs might dilute markets for their specific works.

The decision directly contradicted reasoning from a related case decided one day earlier, in which Judge William Alsup ruled differently on similar fair use questions. Judge Chhabria criticized Alsup's comparison of AI training to "training schoolchildren to write well" as an "inapt analogy," noting that children cannot generate "countless competing works with a miniscule fraction of the time and creativity it would otherwise take." Kint specifically drew attention to this judicial disagreement, describing it as noteworthy "Judge on Judge action" regarding the appropriateness of educational analogies in AI training contexts.

While favorable to Meta, the ruling contains several limiting factors that Kint emphasized in his analysis. The decision applies only to the thirteen named plaintiffs, leaving other authors whose works were used in Llama training free to pursue their own copyright claims. According to Kint, this limitation means "the rest of the class can sue again on the same claims. And mark my words they will."

The court also left Meta's alleged distribution violations through torrenting unresolved, scheduling a July 11 case management conference to address those issues. This distinction between reproduction and distribution rights could prove significant in ongoing litigation, as Kint noted the importance of this legal differentiation.

According to analysis published on PPC Land, this ruling could significantly impact how digital marketing platforms approach AI development and training data acquisition. The marketing industry has been closely monitoring AI copyright decisions as companies increasingly deploy large language models for content generation and campaign optimization.

The decision also signals that future AI copyright cases will likely focus on market dilution arguments rather than direct copying claims. Legal experts suggest that content creators in industries particularly vulnerable to AI-generated competition—such as news media, stock photography, and commercial writing—may find stronger grounds for copyright protection.

Meta's victory, while substantial, may prove narrow in scope. The court explicitly noted that "in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission" and that companies developing AI systems should expect to "generally need to pay copyright holders for the right to use their materials."

The ruling emphasizes that fair use analysis remains highly fact-specific, with outcomes dependent on the particular works involved, the AI system's capabilities, and evidence of actual market effects. This case-by-case approach suggests continued uncertainty for AI developers seeking to train models on copyrighted content.

For the marketing technology sector, which relies heavily on AI-powered tools for content creation, audience analysis, and campaign optimization, the decision provides some clarity while highlighting ongoing legal risks. Companies developing marketing AI systems may need to carefully document their training data sources and consider proactive licensing arrangements with content creators.

The plaintiffs' failure to present compelling evidence of market harm proved decisive, but the court's detailed discussion of market dilution theory provides a roadmap for future cases. As AI-generated content becomes more sophisticated and prevalent, copyright battles in the digital marketing space are likely to intensify.

Judge Chhabria concluded that Meta's unauthorized use of copyrighted books was fair use under current circumstances, but warned that future cases with better-developed records on market effects might reach different conclusions. "In cases involving uses like Meta's, it seems like the plaintiffs will often win, at least where those cases have better-developed records on the market effects of the defendant's use," he wrote.

The decision represents a significant development in the evolving legal landscape surrounding AI and copyright, providing both guidance and caution for technology companies navigating the complex intersection of artificial intelligence and intellectual property law. As Kint's analysis suggests, careful attention to the ruling's specific limitations and judicial reasoning will be crucial for understanding its broader implications.

Timeline

October 2022: Meta begins downloading Library Genesis database to evaluate potential for LLM training
Spring 2023: Meta abandons licensing negotiations after escalation to CEO Mark Zuckerberg, decides to use unlicensed shadow library content
February 2023: Meta releases Llama 1
July 2023: Meta releases Llama 2; Authors file initial lawsuit
Early 2024: Meta downloads Anna's Archive compilation
April 2024: Meta releases Llama 3 and Meta AI chatbot
May 2024: US Copyright Office releases comprehensive AI training report examining when developers need permission for copyrighted works
November 2024: Court dismisses Raw Story's DMCA case against OpenAI over training data
June 2025: Reddit sues Anthropic over unauthorized AI training
June 23, 2025: Judge Alsup issues contrasting ruling in Bartz v. Anthropic case
June 25, 2025: Judge Chhabria grants Meta's summary judgment motion
July 11, 2025: Scheduled case management conference for remaining distribution claims