Class Action Says Makers of ChatGPT Engaged in ‘Systematic Theft’ of Authors’ Copyrighted Works
Last Updated on October 16, 2023
The Authors Guild et al. v. OpenAI Inc. et al.
Filed: September 19, 2023 ◆§ 1:23-cv-08292
A class action accuses the companies behind ChatGPT of violating authors' exclusive rights by illegally reproducing and “training” the AI chatbot with their copyrighted works without consent.
A proposed class action filed by a professional writers’ association and 17 authors accuses the companies behind ChatGPT of violating their exclusive rights by illegally reproducing and “training” the artificial intelligence (AI) chatbot with their copyrighted works without consent.
Want to stay in the loop on class actions that matter to you? Sign up for ClassAction.org’s free weekly newsletter here.
In the 47-page lawsuit, the Authors Guild and several high-profile novelists, including John Grisham, George R. R. Martin, Jodi Picoult, David Baldacci and Michael Connelly, challenge OpenAI, Inc. and a “tangled thicket of interlocking entities”—which together are responsible for the creation and propagation of ChatGPT—over what they allege is “mass-scale copyright infringement that violates the rights of all working fiction writers and their copyright holders.”
The suit relays that OpenAI has readily admitted to reproducing copyrighted works and feeding them into its “large language models” (LLMs), which are predictive algorithms designed to generate responses to user prompts based on the texts and datasets on which they are “trained.” The more text LLMs are fed, the better they “learn” to generate sophisticated, human-like responses, the case explains.
“In other words, books are the high-quality materials [the defendants] want, need, and have therefore outright pilfered to develop generative AI products that produce high-quality results: text that appears to have been written by a human writer,” the complaint contends.
The filing stresses that, at the end of the day, “at the heart” of OpenAI’s commercial success is “systematic theft on a mass scale” without “a word of permission from or a nickel of compensation to copyright owners.”
According to the lawsuit, the defendants admit that the LLMs were “trained” on Common Crawl, a massive repository of public data “scraped from billions of web pages” and known to contain texts copied from pirate websites. Per the suit, OpenAI also points to another body of internet-based texts used to “train” its algorithms, the so-called “Books2” dataset. Though the companies refuse to disclose the sources of this dataset, some independent AI researchers believe it similarly contains “ebook files downloaded from large pirate book repositories,” the case shares.
Despite the fact that the defendants knew the “training” data included copyrighted material, they “willfully proceeded” without obtaining consent from those whose rights were violated and whose livelihoods are “seriously threaten[ed]” by ChatGPT, the complaint alleges.
“[OpenAI’s] LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create,” the filing argues. “Moreover, [the defendants’] LLMs can spit out derivative works: material that is based on, mimics, summarizes, or paraphrases [the plaintiffs’] works, and harms the market for them.”
In June 2023, the Authors Guild published an open letter signed by almost 12,000 authors that protested the failure of OpenAI and other major tech companies to fairly license writers’ works for use in algorithm “training,” the lawsuit relays. As the suit tells it, the letter also highlights the companies’ allegedly exploitative practices to date and the threats posed to writers’ jobs by AI applications such as ChatGPT.
“[The defendants] could have ‘trained’ their LLMs on works in the public domain. They could have paid a reasonable licensing fee to use copyrighted works. What [the defendants] could not do was evade the Copyright Act altogether to power their lucrative commercial endeavor, taking whatever datasets of relatively recent books they could get their hands on without authorization. There is nothing fair about this. [The defendants’] unauthorized use of [the plaintiffs’] copyrighted works thus presents a straightforward infringement case applying well-established law to well-recognized copyright harms.”
The lawsuit looks to represent anyone in the United States who is the sole author of, and sole legal or beneficial owner of an eligible copyright—that is, any copyright that was registered with the United States Copyright Office before or within five years after first publication of the work, and whose effective date of registration is either within three months after first publication or before the defendants began using the work to “train” its LLMs—in one or more works of fiction that have sold at least 5,000 copies and the text of which has been, or is being, used to “train” one or more of OpenAI’s LLMs. The suit also seeks to cover those who are the sole legal or beneficial owners of eligible copyrights in one or more such works held by literary estates.
Get class action lawsuit news sent to your inbox – sign up for ClassAction.org’s free weekly newsletter here.
Hair Relaxer Lawsuits
Women who developed ovarian or uterine cancer after using hair relaxers such as Dark & Lovely and Motions may now have an opportunity to take legal action.
Read more here: Hair Relaxer Cancer Lawsuits
How Do I Join a Class Action Lawsuit?
Did you know there's usually nothing you need to do to join, sign up for, or add your name to new class action lawsuits when they're initially filed?
Read more here: How Do I Join a Class Action Lawsuit?
Stay Current
Sign Up For
Our Newsletter
New cases and investigations, settlement deadlines, and news straight to your inbox.
Before commenting, please review our comment policy.