What the New York Times’ copyright suit means for AI

Getty Images

Hello Eye on AI readers and Happy 2024!

As many of you know, I was on leave for the past several months, working on a book about the generative AI revolution and all its potential ramifications. The book is due to be published this summer by Simon & Schuster. I’ll be letting you know more about it as the publication date approaches. Now back at Fortune, I’m assuming a new role as our AI editor, helping to build out our coverage of this vital technology. And I’ve got some exciting news: Eye on AI will be coming to your inbox more frequently. We are dedicated to providing you, as business leaders, with all the AI news you need to stay informed. AI is currently one of the hottest topics in the corporate world, and considering its rapid advancements, Eye on AI will now be delivered to you twice a week, on Tuesdays and Thursdays. Imagine, you'll be twice as knowledgeable as before!

OK, the biggest news in AI this past week has got to be the copyright infringement lawsuit the New York Times filed against Microsoft and OpenAI in federal court on Dec. 27. It’s a doozie, one many think will be precedent-setting. Some commentators speculated it could even spell the end of OpenAI, and perhaps the entire business model on which many generative AI companies have been built. The suit doesn’t include a specific claim for damages but says the two tech companies should be held liable for “billions of dollars in statutory and actual damages.”

OpenAI, which had been in talks with the Times since April over possible licensing terms for the newspaper’s content, said it had thought negotiations were progressing and that it was “surprised and disappointed” by the Times’ suit. “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from A.I. technology and new revenue models,” OpenAI spokesperson Lindsey Held said. “We’re hopeful that we will find a mutually beneficial way to work together, as we are doing with many other publishers.” Microsoft declined to comment on the lawsuit.

The Times alleges that tens of thousands of its articles were copied, without its permission, in the process of training the GPT models that underpin OpenAI’s ChatGPT and Microsoft’s CoPilot (formerly called Bing Chat). It also alleges that ChatGPT and CoPilot allow users to further infringe on the Times’ copyrights by producing text that plagiarizes Times articles. It argues that the integration of OpenAI’s GPT models with web browsing and search tools steals commercial referrals and traffic from the newspaper’s own website. In a novel claim for this sort of case, the publisher also alleges its reputation is damaged when OpenAI’s models hallucinate, making up information and falsely attributing it to the Times. Among the reams of evidence that the Times submitted in support of its claims is a 127-page exhibit that includes 100 examples of OpenAI’s GPT-4 outputting verbatim lengthy passages from Times articles when prompted with just a sentence, or part of a sentence, from the original.

The Times lawsuit is certainly the most significant of the copyright infringement claims that have been filed against OpenAI and Microsoft to date. The Times has top copyright lawyers, relatively deep pockets, and a history of pursuing claims all the way to the Supreme Court when it feels an issue presents a threat to not just its own journalism, but to the free press as a whole. The newspaper is claiming here that OpenAI’s copyright infringement undercuts the revenues publications require to serve the public interest through news reporting and investigative journalism. This sets it apart from most of the other copyright infringement claims previously filed against OpenAI, which simply pit the commercial interests of creators against those of OpenAI. But what really differentiates the Times’ case is the clarity of the narrative and exhibits it presents. Many commentators believe these will prove highly persuasive to a jury if the case winds up in front of one.

Gary Marcus, the emeritus New York University cognitive scientist and vocal AI expert, opined, in a series of posts on X (formerly Twitter), that this is OpenAI’s Napster moment. He claims the Times’ lawsuit could wind up bankrupting the high-flying AI startup, just as a landmark 2001 copyright judgment against Napster obliterated the peer-to-peer music-sharing company’s business model and eventually drove it under.

Having done a fair bit of research into AI and copyright for my forthcoming book, I think this is unlikely to happen. For one, this case is likely to settle. The fact that the newspaper was in negotiations with OpenAI for a licensing deal and only filed suit after those talks apparently reached an impasse (probably because the Times was asking for more money than OpenAI wanted to pay) is a good indication that, despite the public interest gloss the Times applied to its complaint, its real motivation here is commercial. OpenAI has signed a deal with the Associated Press to license its content for AI training and last month inked a multiyear deal with publisher Axel Springer, which owns Business Insider and Politico, that gives OpenAI access to its current and archived content. That deal is worth more than $10 million per year, according to one report. OpenAI and Microsoft have a strong incentive to settle rather than deal with years of legal uncertainty; chances are, they will.

Even if this case goes to trial, a ruling might not ultimately go the Times’ way. Microsoft has deeper pockets than the Times and also has access to top-notch legal talent. And there are more precedents here than just Napster. Copyright experts vigorously debate which cases might be most analogous—the Google Books case, the Sega case, the Sony case, or the recent Andy Warhol case. The specifics of these analogies are too complicated to get into here. But the point is, this is far from a settled matter, and OpenAI and Microsoft have decent arguments they can use to try to defend themselves. It isn’t open and shut by any means.

It is also possible that the U.S. Copyright Office or Congress will weigh in before the Supreme Court does. The Copyright Office has just concluded a commentary period on the implications of generative AI. The Senate also recently held hearings on the topic. It is possible Congress will step in and pass a new law that would render the Times’ claim moot. Some legal scholars have suggested Congress should create a “fair learning” law that gives software companies an explicit right to use copyrighted material for AI training. Meanwhile, those sympathetic to rights holders have suggested lawmakers should mandate that creators are compensated for any works used to train AI. Congress could also insist that AI companies apply filters to screen out any model outputs that are identical to copyrighted material used in training. There is a precedent for Congress weighing in this way: The 1992 Audio Home Recording Act exempted sellers of digital audio tape from being sued for copyright infringement. But it also set up a licensing fee that all manufacturers and importers of audio recording devices have to pay to the Copyright Office, which then distributes those funds as royalty payments to music rights holders. Congress could wind up establishing a similar licensing and royalty regime for generative AI software.

Finally, even if OpenAI is ultimately forced to pay creators’ licensing fees, it can probably afford it. The company is, according to some news accounts, currently bringing in revenue at a $1.6 billion per year clip, with some insiders predicting that this figure will hit $5 billion before 2024 is out. With this kind of cash machine, OpenAI can probably survive. While copyright infringement claims sank Napter, Spotify was eventually able to reach a settlement with music rights holders. And while those payments crimped Spotify’s profits, and the company has lately struggled to sell stock investors on a convincing growth story, Spotify is also not about to go bust.

So, no, I don’t think OpenAI will go under. But I do think the Times’ lawsuit signifies that the era of freely using copyrighted material for AI training is coming to an end. The threat of lawsuits will push most companies building AI models to license any data they use. For instance, there are reports that Apple is currently in discussions to do exactly this for the data it is seeking to train its own AI models. In image generation, artists are also increasingly turning to masking technology that makes it impossible to effectively train AI models on their work without consent. Similar technology does not yet exist for text or music, but researchers are working on it. And plenty of publishers have now taken steps to prevent their websites from being freely scraped by web crawlers. Pretty soon, the only way companies are going to be able to obtain the data they need to train good generative AI models is if they pay to license it. One way or another, the sun is setting on the Wild West of generative AI.

And with that, more AI news below.

Jeremy Kahn
jeremy.kahn@fortune.com
@jeremyakahn

This story was originally featured on Fortune.com

Advertisement