Sora is making waves across the internet, and it's so popular that even my grandma, after catching a CNN news segment, asked me if my job would be at risk in the coming years. The concerns and uncertainty surrounding Sora are valid, but they also spark creativity and open up exciting new possibilities.
What is Sora?
You've probably heard it before, but let me reiterate: Sora is an AI video generator that takes your text prompt and creates a video that matches the description. Sound similar to established names such as Runway ML and Pika labs? Sora exceeds in these aspects compared to previous models:
- Sora is capable of creating up to a minute-long video.
- Sora can generate coherent and realistic videos with dynamic camera movement and different angles.
- Sora can simulate simple actions that affect the state of the world, such as a painter leaving brushstrokes on a canvas or a dent when a person eating a burger.
Is Sora Released Now
Since Sam Altman announced Sora on X (Twitter) on Feb 16, 2024, this video generation model has finally been publicly released on December 9, 2024. It's now available on Sora.com for ChatGPT subscribers in the US and most other countries. The new Sora Turbo model features capabilities like video remixing, text-to-video, and image animations.
Sora Subscription Plans
To start creating videos with Sora, you'll need a premium subscription to one of OpenAI's plans: ChatGPT Plus or ChatGPT Pro. Here's a breakdown of what each plan offers:
ChatGPT Plus - $20/month
- Video Length: Up to 5 seconds.
- Resolution: 720p.
- Video Limit: 50 videos per month.
- Other Perks: Access to all other OpenAI products, including ChatGPT enhancements.
This plan is ideal for casual users who want to experiment with video generation and create short videos for quick projects or social media content.
ChatGPT Pro - $200/month
- Video Length: Up to 20 seconds.
- Resolution: 1080p.
- Video Limit: 500 videos per month.
- Video Variations: Generate up to 5 variations of a single video prompt, making it easier to compare options.
- Unlimited Relaxed Videos: When site traffic is low, Pro users can queue unlimited videos for generation at a later time.
- Other Perks: Full access to all OpenAI products, plus priority access to new features.
How to Use Sora in Various Scenarios
From what has been disclosed from OpenAI, people can use Sora with text, image and video prompts. It excels at 3D consistency and can maintain coherence across the scenes. Sora AI can simulate real-world people, animals, and landscapes.
Text to Videos
Sora can understand your instructions and generate longer videos.
Prompt: A brown and white border collie stands on a skateboard, wearing sunglasses
Image to Images/Videos
You can turn a still image into images, animations or dynamic videos.
"Travel" Backward or Forward
Sora can extend videos in both forward and backward directions in time – from a segment of the video at your instruction. If you have been surprised previously with AI image outpainting, then this video-extending feature is just mind blowing.
With this feature, you can create interesting loop videos for infinite playback.
Video to Video
- Style transformation: Sora can transform the styles and elements from one video into another. For a car running on a mountain, you can alter it to race in lush jungles or cyberpunk city streets.
- Video merging: Sora can merge two videos into one for an enchanting visual experience.
AI Video to 3D Models
The ability to convert Sora AI video into 3D models hasn't been demonstrated the OpenAI's official presentations or insider clips thus far. However, this potential capability has been eagerly anticipated and positively debated among AI enthusiasts. The excitement stems from the possibility that, if successfully implemented, it could revolutionize the existing workflow in sectors like VFX, 3D modeling, gaming, and beyond.
Currently, some AI enthusiasts are testing Sora-plus-other-tools in the 3D modeling workflow. For instance, you can upload Sora videos into Poly.cam and use the Gaussian Splat tool. Although the rendering is not ideal currently, it opens up the possibilities for the future.
Best Prompts for Sora AI Videos
While Sora AI is not yet publicly accessible, insider creations and the official showcase videos have given us a glimpse into its capabilities. Here are some of the best Sora prompts.
Prompt: A red panda and a toucan are best friends taking a stroll through Santorini during the blue hour.
Prompt: In a beautifully rendered papercraft world, a steamboat travels across a vast ocean with wispy clouds in the sky. vast grassy hills lie in the distant background, and some sealife is visible near the papercraft ocean's surface
Prompt: A giant cathedral is completely filled with cats. there are cats everywhere you look. a man enters the cathedral and bows before the giant cat king sitting on a throne.
Prompt: POV footage of an ant navigating the inside of an ant nest.
The Technical Aspects: Spacetime Patches, the Diffusion and Transformer Models
OpenAI shares the insight when creating Sora on their research page, and here are some brief explanations.
- Input the raw video into an OpenAI-trained network that can reduce the dimensionality of videos and images.
- The network outputs a temporally and spatially compressed latent space.
- Sora can generate videos from this compressed latent space (Sora is also trained on that latent space).
- Then, there is a decoder model that can "translate" the generated latent (where we can't see) back to pixel space (where we can see).
With the diffusion model, Sora can predict the original "clean" patches out of the input noisy patches. What makes it more scalable is the diffusion transformer. For instance, 32x compute can improve the video quality significantly better than 4x compute, given a fixed seed.
Patches here is Sora's analogy to text tokens in large language models such as ChatGPT. During the training, the spacetime patches are extracted from the compressed input videos and served as tokens for the transformer model.
This patch-based scheme helps Sora to become the general simulator, without being restricted by video resolutions, aspect ratios, and durations, thus voiding errors that other models have due to fixed specs.
In other words, that's why we are seeing those astonishing realistic videos with consistency both spatially and temporally.
Above is a brief introduction of Sora on OpenAI's website. We can infer from it the model's capabilities and underlying goals. It is aiming for something big.
While Sora is primarily framed as a text-to-video generator, it is also intended to serve as a platform for building "world simulators" or, in OpenAI's words, "general purpose simulators of the physical world." This is highlighted in the Sora research paper.
From the technical discussions above, we can see that the use of spacetime patches is essential for fueling the world simulator.
Legacy Timeline
On March 25, 2024, OpenAI published a blog titled Sora: First Impressions, where they showcase Sora's ability to "make things that are totally surreal". The article features seven videos created by visual artists, filmmakers, designers, and creative directors. You can watch Sora videos here.
If you enjoyed the visuals, story-telling and art direction showcased by these talented artists and directors, you can learn more about their projects. Below is a handy list of where you can find them:
Paul Trillo:
- https://twitter.com/paultrillo
- https://www.instagram.com/paultrillo
Nik Kleverov:
- https://twitter.com/kleverov
- https://www.instagram.com/kleverov/
- Co-founder of: https://www.nativeforeign.tv/
August Kamp:
- https://www.instagram.com/august.kamp
- https://twitter.com/guskamp
- https://www.youtube.com/@augustkamp
Josephine Miller:
- https://twitter.com/DonAllenIII
- https://www.instagram.com/josephinemiller
Don Allen Stevenson III:
- https://twitter.com/DonAllenIII
- https://www.instagram.com/donalleniii
Alexander Reben:
- https://twitter.com/artBoffin
- https://www.instagram.com/artboffin/
- https://areben.com/
According to Paul Trillo, working with Sora has liberated him as the director, allowing him to ideate and experiment in bold and exciting ways without the usual constraints such as time and money.
This opens up a new era of abstract expressionism, enabling artists to expand on stories and concepts that were once thought impossible to visualize.