Mark Zuckerberg, Meta hit with explosive copyright lawsuit over alleged use of pirated books to train Llama AI

 Mark Zuckerberg, Meta hit with explosive copyright lawsuit over alleged use of pirated books to train Llama AI

Mark Zuckerberg faces fresh AI copyright lawsuit. Image Credit: Will Oliver/EPA/Bloomberg—Getty Images

Meta CEO Mark Zuckerberg is facing a major legal challenge after five leading publishers and bestselling author Scott Turow filed a class-action lawsuit accusing the tech giant of using pirated books and journal articles to train its Llama artificial intelligence models.

The lawsuit, filed in the United States District Court for the Southern District of New York, alleges that Meta illegally downloaded copyrighted content from piracy databases and unauthorized sources to build and improve its AI systems. The case could become one of the most significant copyright battles in the rapidly expanding AI industry, with implications for publishers, authors, and technology companies worldwide.



Publishers involved in the suit include Macmillan Publishers, Hachette Book Group, McGraw Hill, Elsevier, and Cengage.

Publishers Accuse Meta of Using Pirated Books for AI Training

According to court filings, the plaintiffs claim Meta used pirated libraries such as LibGen, Anna’s Archive, and Sci-Hub to obtain millions of copyrighted works without authorization or licencing agreements.

The complaint alleges that Meta engineers downloaded textbooks, fiction titles, scientific journals, and educational materials to train successive versions of the Llama AI model. Several popular books were reportedly included in the training datasets, The Fifth Season including and The Wild Robot.

The lawsuit further claims that Meta removed copyright management information from the materials and distributed infringing copies through torrenting activities connected to AI development.

Plaintiffs also alleged that Zuckerberg personally authorized aspects of the data acquisition process, making him individually liable in parts of the complaint.



Meta Defends AI Training as ‘Fair Use’

Meta has strongly rejected the allegations and said it will aggressively contest the lawsuit.

A spokesperson for the company argued that courts have already recognized AI training on copyrighted material as potentially qualifying under “fair use” protections in U.S. copyright law.

The company is relying partly on legal momentum gained in 2025, when a federal court ruled in Meta’s favor in a similar copyright case involving authors who accused the company of illegally using books to train Llama.

In that decision, the judge ruled that the plaintiffs failed to sufficiently demonstrate market harm caused by Meta’s AI systems. The court also described AI training as “highly transformative,” noting that Llama did not reproduce large portions of copyrighted works directly.

However, the judge separately acknowledged concerns regarding the alleged use of pirated repositories, suggesting such practises could indicate bad faith under certain circumstances.



READ ALSO

Meta eyes massive 20% layoffs as AI costs explode: Mark Zuckerberg’s biggest job cuts yet could hit 15,000+ workers

Why the New Meta AI Lawsuit Matters

The new case could reshape how AI companies source training data and how courts interpret copyright protections in the age of generative artificial intelligence.

The plaintiffs argue that AI-generated summaries, imitations, and “copycat books” produced by systems like Llama threaten the livelihoods of authors and publishers.

According to the complaint, AI-generated content is already flooding online marketplaces, potentially reducing demand for original human-authored works.



Turow described Meta’s alleged conduct as “shameless, damaging and unjust,” arguing that one of the world’s richest technology companies knowingly used pirated copies of copyrighted books to train commercial AI products.

Industry experts say the case is being closely watched because it combines claims related to both piracy and AI-generated market disruption.

Growing Pressure on AI Companies Over Copyright Issues

The legal pressure facing Meta reflects a broader wave of lawsuits targeting major AI developers.

Companies including OpenAI, Google, Anthropic, and xAI have all faced legal scrutiny over the use of copyrighted content for training large language models.

Last year, Anthropic reportedly agreed to a massive settlement with authors over similar allegations involving AI training data.

Meanwhile, publishers and creators continue pushing for stricter safeguards, licencing agreements, and compensation structures to prevent unauthorized use of their intellectual property.

Mark Zuckerberg’s Role Under Scrutiny

The lawsuit places unusual attention on Zuckerberg personally, claiming the Meta founder actively encouraged the acquisition of copyrighted material through unauthorized sources.

Legal analysts say this could intensify public scrutiny around executive accountability in AI development.

The plaintiffs are seeking permanent injunctions, statutory damages, attorneys’ fees, and an order forcing Meta to destroy allegedly infringing datasets used in training Llama models.

If successful, the case could force sweeping changes across the AI industry and potentially slow the development of large-scale generative AI systems dependant on massive text datasets.

 

 

FAQ

Why are publishers suing Mark Zuckerberg and Meta?

Major publishers and author Scott Turow allege that Meta illegally used pirated books, journals, and copyrighted materials to train its Llama AI models without permission or licencing agreements.

What is the Meta Llama AI lawsuit about?

The lawsuit claims Meta downloaded copyrighted works from piracy databases like LibGen and Anna’s Archive to develop and improve its generative AI system known as Llama.

Did Mark Zuckerberg personally approve the use of pirated books?

According to the complaint, Zuckerberg allegedly authorized or encouraged the use of infringing datasets during AI development. Meta has denied wrongdoing.

What is Llama AI?

Llama is Meta’s family of large language AI models designed for generating text, answering questions, coding, and powering AI applications.

What books were allegedly used by Meta?

The complaint references several works, including The Fifth Season and The wild Robot, among others.

What does Meta say about the allegations?

Meta argues that AI training may qualify as “fair use” under U.S. copyright law and says courts have previously supported aspects of that defense.

Has Meta faced similar lawsuits before?

Yes. Meta has previously faced lawsuits from authors, publishers, and media organizations over claims that copyrighted content was used to train AI models.

Could the lawsuit affect the future of AI?

Yes. The outcome could influence how AI companies obtain training data, whether licensing becomes mandatory, and how copyright laws apply to generative AI systems.

What are the publishers asking the court to do?

The plaintiffs want the court to award damages, stop alleged infringement, and require Meta to destroy datasets containing illegally acquired copyrighted works.

Why is this lawsuit important for authors?

Authors and publishers argue that AI-generated summaries and imitation works could reduce book sales and threaten the long-term value of original creative content.