OpenAI's CTO said she wasn't sure if Sora was trained on YouTube videos. YouTube's CEO says that would be a problem.

side-by-side of Murati's face and Mohan's face
OpenAI's CTO Mira Murati (left) and YouTube's CEO Neal Mohan (right).Patrick T. Fallon/AFP Mandel Ngan/AFP via Getty Images
  • Is OpenAI training its video generator Sora on YouTube content?

  • If it is, that would be a violation of YouTube's terms of service, its CEO said.

  • But OpenAI's own chief technology officer could not answer if Sora is scraping YouTube content.

OpenAI should not be using YouTube videos to train its artificial intelligence tools, YouTube's CEO says.

But, is it?

OpenAI's chief technology officer, Mira Murati, said she doesn't know.

In an interview with The Wall Street Journal last month, Murati was asked if OpenAI's text-to-video generator Sora was trained on video content from YouTube.

"I'm actually not sure about that," Murati told the Journal.

YouTube's CEO Neal Mohan told Bloomberg on Thursday that he also does not know if OpenAI is using YouTube content to train its video generator.

If Sora is, in fact, using YouTube content, that would be a "clear violation" of the platform's terms of service, Mohan said.

"From a creator's perspective, when a creator uploads their hard work to our platform, they have certain expectations," Mohan told Bloomberg's Emily Chang. "One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform."

Mohan added that Google (which owns YouTube) does use some YouTube videos to train its own AI platform, Gemini, but only if the individual creators on the platform agreed to that in their contracts.

In response to Business Insider's request for comment, a YouTube spokesperson confirmed that the company's terms "prohibit unauthorized scraping or downloading of YouTube content."

OpenAI did not respond to Business Insider's request for comment.

The debate over what kinds of content tech companies are using to train their AI models has been gaining speed as the artificial intelligence industry explodes. And many artists and creators have been leading the charge, arguing that their copyrighted works cannot be used without their permission.

OpenAI is no stranger to lawsuits about the data collection practices of its AI tools. Among those who have sued the company alleging copyright infringement are comedian and author Sarah Silverman, whose case was partially dismissed, "Game of Thrones" writer George R.R. Martin, and The New York Times.

In February, OpenAI asked the judge overseeing The Times' lawsuit to dismiss, in full or in part, four of the six counts the outlet lodged against the company, alleging that the Times paid someone to hack into OpenAI's products.

And, last summer more than 8,000 authors wrote an open letter to AI leaders, including OpenAI's Sam Altman, demanding compensation for using their works to train AI tools without permission.

Read the original article on Business Insider

Advertisement