Newspaper publishers sue Microsoft, OpenAI over copyright infringement

Joe Biden and Xi Jinping to speak after rare U.S. security adviser trip to China

Affirm (AFRM) earnings report Q4 2024

Sam Altman, CEO of OpenAI, during a panel session at the World Economic Forum in Davos, Switzerland, on Jan. 18, 2024.

Stefan Wermuth | Bloomberg | Getty Images

Eight U.S. newspaper publishers filed suit against Microsoft and OpenAI in a New York federal court on Thursday, claiming the technology companies reuse their articles without permission in generative artificial intelligence products and incorrectly attribute inaccurate information to them.

The legal challenge comes four months after The New York Times sued OpenAI over copyright infringement in the ChatGPT chatbot that the startup released in late 2022. OpenAI said in a January blog post that the case is without merit, adding it wants to support “a healthy news ecosystem.” Sam Altman, OpenAI’s CEO, said in January that the startup had wanted to pay The New York Times and was surprised to learn about the lawsuit.

In recent months, OpenAI has signed deals with a handful of media companies, including Axel Springer and The Financial Times, enabling the Microsoft-backed startup to draw on the publishers’ content in order to improve AI models. Google, which has its own general-purpose chatbot for responding to user queries, said in February that it had reached an agreement with Reddit that includes the right to train AI models on the platform’s content.

The group of eight newspaper publishers takes issue with ChatGPT and Microsoft’s Copilot assistant — available in the Windows operating system, the Bing search engine and other products the software maker produces — for “purloining millions of the publishers’ copyrighted articles without permission and without payment,” according to the complaint.

Microsoft and OpenAI representatives did not immediately respond to requests for comment. The newspaper publishers in the lawsuit operate The New York Daily News, The Chicago Tribune, The Orlando Sentinel, The Sun-Sentinel of Florida, The Mercury News of California, The Denver Post, The Orange County Register in California and The Pioneer Press of Minnesota.

They said OpenAI has drawn on data sets containing text from their newspapers to train its GPT-2 and GPT-3 large language models, which can spit out text in response to a few words of human input.

“The current GPT-4 LLM will output near-verbatim copies of significant portions of the publishers’ works when prompted to do so,” the complaint said, showing several examples of ChatGPT and the Copilot allegedly doing so.

The publishers said Microsoft copies information from their newspapers for the Bing search index, which helps to inform answers in the Copilot. But such output doesn’t always provide links to newspaper websites, where they can view ads alongside articles or pay for subscriptions.

The New York Times case also touched on the matter of OpenAI models regurgitating information from its articles. In its blog post, OpenAI characterized such behavior “a rare failure of the learning process that we are continually making progress on.”

WATCH: OpenAI CEO Sam Altman: The U.S. needs an AI policy