May 20, 2026

Major Publishers Challenge AI Training Practices in Landmark Copyright Suit Against Meta

Holland & Knight Alert
Courtney L. Batliner | Kisha Wilson

Highlights

  • Major publishing houses and bestselling author Scott Turow have sued Meta and Mark Zuckerberg, alleging Meta used pirated copyrighted works to train its Llama artificial intelligence (AI) models and removed copyright management information to conceal the sources.
  • The case builds on recent AI copyright rulings and focuses heavily on two key fair use issues: unlawful sourcing of training data and demonstrable market harm to copyright owners.
  • The litigation could shape how courts evaluate fair use in AI training cases, particularly when plaintiffs present evidence of licensing market disruption and AI-generated outputs that may substitute for original works.

Five major publishing houses – Elsevier, Cengage, Hachette Book Group, Macmillan Publishers and McGraw Hill – along with bestselling author Scott Turow, on May 5, 2026, filed a putative class action against Meta Platforms Inc. and its CEO, Mark Zuckerberg, in the U.S. District Court for the Southern District of New York (captioned Elsevier Inc. et al. v. Meta Platforms, Inc. et al., Case No. 1:26-cv-03689).

The complaint advances two principal theories of liability:

  • Willful Copyright Infringement. Plaintiffs allege that Meta committed willful copyright infringement by torrenting millions of copyrighted books and journal articles from pirate sites to train its Llama artificial intelligence (AI) large language models (LLMs).
  • CMI Violations. Plaintiffs claim that Meta violated copyright law by removing copyright management information (CMI) from the works to conceal their origins.

The plaintiffs seek statutory damages, injunctive relief and destruction of infringing copies.

Though building upon themes that have emerged in recent copyright suits against AI developers, the case raises new questions. For the first time, the plaintiff copyright holders include publishing companies, not just individual authors, and the defendants include an AI company and its CEO.

Positioning Relative to Prior Fair Use Rulings

The Elsevier litigation is shaped around two issues that proved dispositive in recent decisions favoring AI developers.

  • Lawful Sourcing: In Bartz v. Anthropic (N.D. Cal. June 2025), Judge William Alsup held that whereas the fair use defense did not apply to Anthropic's retention of pirated copies sourced from online "shadow libraries," using lawfully acquired books to train Anthropic's LLM was "spectacularly" transformative and qualified as fair use despite the plaintiffs' claims of market harm

Leveraging this distinction, the Elsevier complaint alleges that Meta trained its Llama models on more than 267 terabytes of copyrighted material from notorious pirate sites. It further claims that Meta masked its IP addresses to avoid detection when torrenting from unlawful sources, stripped CMI, including copyright notices, from the works it acquired from those sources and abandoned legitimate licensing negotiations with publishers at Zuckerberg's direction. Taken together, these allegations are framed to defeat a fair use defense under the logic applied in Bartz and establish willfulness to support enhanced damages.

  • Demonstrable Market Harm: In Kadrey v. Meta Platforms (N.D. Cal. June 2025), Judge Vince Chhabria ruled for Meta, relying heavily on the fourth fair use factor: the effect of the defendant's use on the potential market for plaintiff's work. In so doing, however, he noted that the Kadrey plaintiffs had "presented no meaningful evidence on market dilution" and that future plaintiffs with "better-developed records on the market effects" might prevail.

Apparently addressing those comments, the Elsevier complaint alleges evidence of market harm not available to the individual authors in Kadrey. It claims that Llama produces full-length scientific papers and journal articles, replacement chapters and study guides for academic textbooks, and other materials that substitute for the publisher plaintiffs' broad range of works. It also alleges Meta circumvented existing licensing markets for AI training materials, directly depriving publishers of revenue. If the Elsevier plaintiffs successfully establish such market harm, they may prevail where the Kadrey plaintiffs did not.

Practical Implications

  • For Copyright and Fair Use Law: This case could test whether evidence of unauthorized sourcing and market harm can outweigh a finding of transformativeness. A ruling in Meta's favor could reinforce the fair use doctrine for AI developers, even in the face of piracy allegations. Conversely, a ruling against Meta could narrow the fair use defense for AI developers, particularly where defendants allegedly bypass available licensing markets.
  • For Copyright Owners: This case may show that institutional plaintiffs with robust market data, an established licensing infrastructure and diverse copyrighted works are better positioned than individual authors to support claims of market harm. For authors and publishers alike, it may encourage copyright holders to monitor for and document evidence of market displacement from AI-generated material and develop and participate in well-defined licensing programs for AI training content.
  • For AI Developers: The complaint highlights potential liability for sourcing training data from unauthorized channels, particularly where licensing alternatives exist. The naming of Zuckerberg as a defendant in his personal capacity may signal an appetite among plaintiffs to pursue individual officer liability. Developers may see fit to evaluate their data provenance practices, document good-faith licensing efforts and assess whether their models' outputs could functionally substitute for source materials in ways that create market harm.

Key Takeaways

  • This is the first AI copyright case brought by major publishing houses, raising new questions about how institutional plaintiffs with robust market data, established licensing programs and broader categories of works may affect the fair use analysis.
  • The complaint centers on allegations of willful infringement, including sourcing training data from pirate sites and removal of CMI.
  • The outcome may turn on whether publishers can present the kind of concrete market-harm evidence that was absent in prior cases brought by individual authors.
  • The case could provide further guidance on how courts weigh data provenance and the existence and circumvention of licensing markets within the fair use framework.
  • AI developers should consider evaluating data provenance practices, documenting good-faith licensing efforts and assessing whether model outputs could substitute for source materials.
  • Copyright owners should consider developing licensing programs and documenting evidence of market displacement.

Holland & Knight will continue to monitor this litigation as it develops. For questions about how this case may affect your organization, please contact the authors or a member of our Intellectual Property Group.


Information contained in this alert is for the general education and knowledge of our readers. It is not designed to be, and should not be used as, the sole source of information when analyzing and resolving a legal problem, and it should not be substituted for legal advice, which relies on a specific factual analysis. Moreover, the laws of each jurisdiction are different and are constantly changing. This information is not intended to create, and receipt of it does not constitute, an attorney-client relationship. If you have specific questions regarding a particular fact situation, we urge you to consult the authors of this publication, your Holland & Knight representative or other competent legal counsel.


 

Related Insights