Comedian and New Hampshire native Sarah Silverman has joined a class-action lawsuit against OpenAI and another against Meta accusing the companies of copyright infringement, saying they “copied and ingested” her protected work in order to train their artificial intelligence programs, according to court papers.
The lawsuits, in which she joined authors Christopher Golden and Richard Kadrey, were filed Friday in the San Francisco Division of the U.S. District Court of the Northern District of California. Each suit says the company in question made copies of the authors’ works, including Silverman’s memoir, “The Bedwetter,” without permission by scraping illegal online “shadow libraries” that contain the texts of thousands of books.
The lawsuit against Meta cites the company’s own research paper about LLaMA, a large-language model it uses to train chatbots. According to the paper, made public in February, scientists included text from The Pile within their training data set; the lawsuit says some of that text comes from shadow libraries.
“Their copyrighted materials were copied and ingested as part of training,” the lawsuit claims. “Many of the plaintiffs’ books appear in the data set that Meta admitted to using.”
Neither OpenAI nor Meta responded to requests for comment on the lawsuits Monday. The plaintiffs are seeking awards for damages and injunctive relief that might include changes to the LLaMA and ChatGPT programs.
Less is known about the source of training data sets for OpenAI’s ChatGPT program. But the lawsuit states that ChatGPT’s ability to generate summaries of the plaintiffs’ works is “only possible if ChatGPT was trained on Plaintiffs’ copyrighted works.”
The text it generated when asked to summarize Silverman’s memoir, “The Bedwetter,” is included as an exhibit.
“One of the key topics in the first part of the memoir is Silverman’s struggle with enuresis, or bed-wetting, which extended into her teenage years,” the program wrote. “This issue caused her significant distress and embarrassment, but also fueled her resilience and ability to deal with adversity.”
Joseph Saveri and Matthew Butterick, the attorneys representing the three authors, are also representing other creators in separate litigation that is challenging Copilot, a coding assistant powered by artificial intelligence on GitHub and an image generator produced by Stability AI.
On a website publicizing their litigation against AI companies, the lawyers assert that “much of the material in the training data sets used by OpenAI and Meta comes from copyrighted works — including books written by plaintiffs — that were copied by OpenAI and Meta without consent, without credit, and without compensation.”
The dual lawsuits belong to a growing number of legal actions that could define the boundaries of how artificial intelligence learns, and what role copyright laws will play in the material that algorithms use for training data sets.
“I expect more to follow,” said Robert deBrauwere, who specializes in digital media and intellectual property at the law firm Pryor Cashman, where he is a partner. He is not involved in the Silverman lawsuit.
This article originally appeared in The New York Times.