OpenAI, Chat GPT creator, unveils Sora to turn writing prompts into videos: What to know

2025-01-19 15:13:51 Markets

OpenAI, the creator of Chat GPT, has unveiled Sora, the latest upgrade in generative artificial intelligence. It's a tool that makes short videos from prompts written by users.

The San Francisco-based company announced the news on Thursday and showed videos created by the new text-to-video generator on their website.

"We’re teaching AI to understand and simulate the physical world in motion with the goal of training models that help people solve problems that require real-world interaction," states OpenAI's website.

Footage of California during the gold rush, tiny pandas running around a petri dish and a gnome creating patterns in the zen garden of his snow globe enclosure are just some of the examples of what Sora, OpenAI's video creation tool, can make.

"We’re sharing our research progress early to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon," states OpenAI on its website.

In an announcement tweeted by Sam Altman, OpenAI's CEO, he said a limited number of people will be able to use the new program right now. It's not publicly available just yet.

"We are starting red-teaming and offering access to a limited number of creators," said Altman in the post.

AI:Find out who's calling, use AI and more with 15 smart tech tips

YouTube star puts Sora, new OpenAI tool, to the test

YouTube's biggest star, Jimmy Donaldson, AKA, MrBeast, replied to Altman's post the two engaged in some playful banter about the new tool.

To that, Altman said he'd make the YouTuber a video. He just needed to give Altman a prompt.

Donaldson asked for a video of a "monkey playing chess in a park," and Altman delivered.

How do I use Sora?

According to the announcement posted to OpenAI's website, Sora is going to be similar to OpenAI's text-to-image generator. Users just need to type out a prompt, and the program will give them a video of what they requested.

However, it can only be accessed by red teamers who will assess "critical areas for harms or risks" for the company and "a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals."

It isn't available to the public, and there is no word on when the layman will be able to use it.

What can Sora do?

The program uses its "deep understanding of language" to interpret prompts and then create videos with "complex scenes" that are up to a minute long, with multiple characters and camera shots, as well as specific types of motion and accurate details.

The examples OpenAI gives range from animated a monster and kangaroo to realistic videos of people, like a woman walking down a street in Tokyo or a cinematic movie trailer of a spaceman on a salt desert.

Embedded content: https://cdn.openai.com/sora/videos/monster-with-melting-candle.mp4

"Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle," in the first sentence of the prompt that created the 3D video above.

According to OpenAI, the videos displayed on its announcement page were all created by Sora.

Challenges that Sora faces

OpenAI states the program may struggle with the following:

Accurately simulating the physics of a complex scene
Understanding instances of cause and effect. An example it gives is someone might bite into a cookie, but the cookie doesn't have a bite mark after.
Confusing spatial details of a prompt, like mixing up left and right.
Precise descriptions of events over time.

Embedded content: https://cdn.openai.com/sora/videos/grandma-birthday.mp4

One of the examples of what can go wrong is a video of a grandma blowing candles out on her birthday. But as she blows them out, the candles don't extinguish.

Prompt given for the video:

A grandmother with neatly combed grey hair stands behind a colorful birthday cake with numerous candles at a wood dining room table, expression is one of pure joy and happiness, with a happy glow in her eye. She leans forward and blows out the candles with a gentle puff, the cake has pink frosting and sprinkles and the candles cease to flicker, the grandmother wears a light blue blouse adorned with floral patterns, several happy friends and family sitting at the table can be seen celebrating, out of focus. The scene is beautifully captured, cinematic, showing a 3/4 view of the grandmother and the dining room. Warm color tones and soft lighting enhance the mood.

What's wrong with it? Well, according to OpenAI, "simulating complex interactions between objects and multiple characters is often challenging for the model, sometimes resulting in humorous generations."

Ethical and societal implications of AI

Folks have been bringing up the ethics behind AI since the program became popular. Situations involving high-ranking officials, like when AI mimicked the president in phone calls and encouraged people not to vote, have already happened.

But OpenAI says they're working on taking safety steps before Sora becomes available to the public.

“We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model,” the company said in its statement. “We’re also building tools to help detect misleading content, such as a detection classifier that can tell when a video was generated by Sora.”

It says it's creating new techniques while also making sure existing safety precautions that already apply to its other program, DALL·E 3, are applicable to Sora.

For example, "our text classifier will check and reject text input prompts that are in violation of our usage policies, like those that request extreme violence, sexual content, hateful imagery, celebrity likeness or the IP of others," states the company. "We’ve also developed robust image classifiers that are used to review the frames of every video generated to help ensure that it adheres to our usage policies, before it’s shown to the user."