On July 17, 2025, the United States District Court for the Northern District of California issued a landmark ruling authorizing the first Class Action in the United States challenging the unauthorized use of copyrighted works to train generative artificial intelligence models. This litigation pits a group of prominent authors and copyright holders against Anthropic PBC, a major player in the generative AI sector.
The plaintiffs are distinguished authors and their corporate entities:
They claim ownership of exclusive reproduction rights in their respective works, exploited individually and through corporate entities. Acting as class representatives, they seek redress not only for their own works but also for those of similarly situated copyright owners whose works were allegedly infringed.
The defendant, Anthropic PBC, is a U.S.-based artificial intelligence company developing large language models (LLMs), including the Claude family of models. Founded in 2021, Anthropic is among the leading firms in the AI industry, alongside OpenAI and Google DeepMind. The plaintiffs allege that Anthropic unlawfully reproduced millions of copyrighted works without authorization or compensation, using them to train its LLMs.
According to the plaintiffs, between 2021 and 2022, Anthropic downloaded massive quantities of copyrighted books from unauthorized sources to build its training datasets. Specifically, the plaintiffs allege that Anthropic:
The plaintiffs assert that Anthropic stored these pirated copies in its internal library and integrated them into training corpora for its LLMs. They argue that metadata (ISBN, ASIN, MD5 hashes) embedded in the downloaded files can reliably identify the infringed works.
The plaintiffs brought suit under the Copyright Act (17 U.S.C. §§ 101 et seq.), alleging:
Anthropic opposed class certification, arguing:
In his opinion, Judge William Alsup rejected Anthropic’s objections and granted partial certification for a class limited to works downloaded from LibGen and PiLiMi. The certified class is defined as:
“All legal and beneficial owners of copyrights in works registered with the Copyright Office within five years of first publication that Anthropic downloaded from Library Genesis (LibGen) or PiLiMi.”
The court held that:
The court denied certification for two other proposed classes – the “Books3 Pirated Books Class” and the “Scanned Books Class” – due to insufficient evidence regarding file quality and copyright ownership.
The court ordered that:
This decision represents a critical milestone in regulating the use of copyrighted works to train AI systems. By certifying this Class Action, the federal court signals to the AI industry that large-scale ingestion of copyrighted materials without proper authorization may trigger collective legal remedies.
The ruling may inspire similar actions in other jurisdictions and contribute to shaping international norms for the lawful and ethical use of protected works in AI training datasets.
This article was prepared by a French lawyer specializing in intellectual property and artificial intelligence. For legal advice on U.S. law, consultation with a qualified local attorney is recommended.