Authors File Lawsuit Against Anthropic Claude AI for Copyright Violations

A group of authors have launched a legal battle against artificial intelligence (AI) startup Anthropic.

They are alleging that the company committed “large-scale theft” by using pirated copies of copyrighted books to train its AI chatbot, Claude.

This lawsuit, filed on Monday in a federal court in San Francisco, marks the first legal action against Anthropic from book authors, though the company is already facing similar challenges from music publishers.

AI ethics under fire! Anthropic, the ‘responsible’ AI startup, gets hit with a copyright lawsuit from authors. The debate: Fair use or theft? #AI #Copyright #TechEthics

— Sparklr AI (@SparklrAI) August 20, 2024

The lawsuit was brought by three writers – Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson – seeking to represent a class of similarly situated fiction and non-fiction authors.

The writers accuse Anthropic of tapping into repositories of pirated writings to build its AI product, claiming that the company’s actions “have made a mockery of its lofty goals” of being a responsible and safety-focused AI developer.

They argue that Anthropic’s model “seeks to profit from strip-mining the human expression and ingenuity behind each one of those works.”

The writers specifically point to a dataset called ‘The Pile,’ which allegedly includes a trove of pirated books, as a source of training data for Claude.

Anthropic was founded by former OpenAI leaders and has positioned itself as a more responsible player in the AI field, promoting the concept of “Constitutional AI” with chatbots that adhere to a central set of programming rules.

However, this lawsuit challenges that image, accusing the company of large-scale copyright infringement.

Notably, the case against Anthropic is part of a broader legal trend targeting AI companies over copyright issues.

OpenAI and Microsoft are already facing lawsuits from prominent authors like John Grisham and George R. R. Martin and media outlets such as The New York Times.

We have organized a lawsuit against OpenAI for copyright infringement of their works of fiction on behalf of a class of authors whose works have been used to train GPT. Plaintiffs incl John Grisham, Jodi Picoult, Victor LaValle, George R.R. Martin & more.https://t.co/1laaRbRCyC

— The Authors Guild (@AuthorsGuild) September 20, 2023

These cases all center on the claim that tech companies have used copyrighted materials without permission or compensation to train their AI models.

The AI Fair Use Debate

Anthropic and other AI companies have argued that using copyrighted materials for AI training falls under US law’s “fair use” doctrine.

This doctrine allows for limited use of copyrighted materials for purposes such as research or transformation into something different. However, this interpretation is now being challenged in the latest lawsuit against Anthropic.

The plaintiffs dispute the applicability of fair use to AI training, arguing that AI systems do not learn like humans. To illustrate this point, they contend that “Humans who learn from books buy lawful copies of them, or borrow them from libraries that buy them, providing at least some measure of compensation to authors and creators.”

Moreover, the lawsuit against Anthropic highlights broader concerns about how AI companies acquire and train data.

In July, iFixit CEO Kyle Wiens accused Anthropic’s AI bot of aggressive web crawling, straining web resources, and violating the site’s Terms of Use.