It sure looks like OpenAI trained Sora on game content -- and legal experts say that could be a problem

OpenAI didn't initially respond to a request for comment. But shortly after this story was published, a PR rep said that they would "check with the team."

If game content is indeed in Sora's training set, it could have legal implications -- particularly if OpenAI builds more interactive experiences on top of Sora.

"Companies that are training on unlicensed footage from video game playthroughs are running many risks," Joshua Weigensberg, an IP attorney at Pryor Cashman, told TechCrunch. "Training a generative AI model generally involves copying the training data. If that data is video playthroughs of games, it's overwhelmingly likely that copyrighted materials are being included in the training set."

Probabilistic models

Generative AI models like Sora are probabilistic. Trained on a lot of data, they learn patterns in that data to make predictions -- for example, that a person biting into a burger will leave a bite mark.

This is a useful property. It enables models to "learn" how the world works, to a degree, by observing it. But it can also be an Achilles' heel. When prompted in a specific way, models -- many of which are trained on public web data -- produce near-copies of their training examples.

That has understandably displeased creators whose works have been swept up in training without their permission. An increasing number are seeking remedies through the court system.

Microsoft and OpenAI are currently being sued over allegedly allowing their AI tools to regurgitate licensed code. Three companies behind popular AI art apps, Midjourney, Runway, and Stability AI, are in the crosshairs of a case that accuses them of infringing on artists' rights. And major music labels have filed suit against two startups developing AI-powered song generators, Udio and Suno, of infringement.

Many AI companies have long claimed fair use protections, asserting that their models create transformative -- not plagiaristic -- works. Suno makes the case, for example, that indiscriminate training is no different from a "kid writing their own rock songs after listening to the genre."

But there are certain unique considerations with game content, says Evan Everist, an attorney at Dorsey & Whitney specializing in copyright law.

"Videos of playthroughs involve at least two layers of copyright protection: the contents of the game as owned by the game developer, and the unique video created by the player or videographer capturing the player's experience," Everist told TechCrunch in an email. "And for some games, there's a potential third layer of rights in the form of user-generated content appearing in software."

Everist gave the example of Epic's Fortnite, which lets players create their own game maps and share them for others to use. A video of a playthrough of one of these maps would concern no fewer than three copyright holders, he said: (1) Epic, (2) the person using the map, and (3) the map's creator.

"Should courts find copyright liability for training AI models, each of these copyright holders would be potential plaintiffs or licensing sources," Everist said. "For any developers training AI on such videos, the risk exposure is exponential."

Weigensberg noted that games themselves have many "protectable" elements, like proprietary textures, that a judge might consider in an IP suit. "Unless these works have been properly licensed," he said, "training on them may infringe."

TechCrunch reached out to a number of game studios and publishers for comment, including Epic, Microsoft (which owns Minecraft), Ubisoft, Nintendo, Roblox, and Cyberpunk developer CD Projekt Red. Few responded -- and none would give an on-the-record statement.

"We won't be able to get involved in an interview at the moment," a spokesperson for CD Projekt Red said. EA told TechCrunch it "didn't have any comment at this time."

Risky outputs

It's possible that AI companies could prevail in these legal disputes. The courts may decide that generative AI has a "highly convincing transformative purpose," following the precedent set roughly a decade ago in the publishing industry's suit against Google.

In that case, a court held that Google's copying of millions of books for Google Books, a sort of digital archive, was permissible. Authors and publishers had tried to argue that reproducing their IP online amounted to infringement.

"The key questions around whether AI models' use of copyrighted materials constitutes copyright infringement remain unsettled," Jesse Saivar, chair of Greenberg Glusker's IP and digital media and technology groups, told TechCrunch. "Is there copying of copyrighted works during the training process, and does that constitute copyright infringement? Does it impact the market for the original work? [And] can the copyright owners of the training materials even allege any actual damage or injury?"

A ruling in favor of AI companies wouldn't necessarily shield their users from accusations of wrongdoing. If a generative model regurgitated a copyrighted work, a person who then went and published that work -- or incorporated it into another project -- could still be held liable for IP infringement.

"Generative AI systems often spit out recognizable, protectable IP assets as output," Weigensberg said. "Simpler systems that generate text or static images often have trouble preventing the generation of copyrighted material in their output, and so more complex systems may well have the same problem no matter what the programmers' intentions may be."

Some AI companies have indemnity clauses to cover these situations, should they arise. But the clauses often contain carve-outs. For example, OpenAI's applies only to corporate customers -- not individual users.

There's also risks beside copyright to consider, Weigensberg says, like violating trademark rights.

"The output could also include assets that are used in connection with marketing and branding -- including recognizable characters from games -- which creates a trademark risk," he said. "Or the output could create risks for name, image, and likeness rights."

The growing interest in world models could further complicate all this. One application of world models -- which OpenAI considers Sora to be -- is essentially generating video games in real time. If these "synthetic" games resemble the content the model was trained on, that could be legally problematic.

"Training an AI platform on the voices, movements, characters, songs, dialogue, and artwork in a video game constitutes copyright infringement, just as it would if these elements were used in other contexts," Avery Williams, an IP trial lawyer at McKool Smith, said. "The questions around fair use that have arisen in so many lawsuits against generative AI companies will affect the video game industry as much as any other creative market."

This article originally appeared on TechCrunch at https://techcrunch.com/2024/12/11/it-sure-looks-like-openai-trained-sora-on-game-content-and-legal-experts-say-that-could-be-a-problem/

Info Pulse Now

It sure looks like OpenAI trained Sora on game content -- and legal experts say that could be a problem

POPULAR CATEGORY

corporate

tech

entertainment

research

misc

wellness

athletics