In 2024Google At the I/O Developer Conference, Google launched Veo, it is a strong competitor to OpenAI's Sora. Let's take a closer look at this product.
Veo Highlights
- Veo is a text-to-video generation model from Google DeepMind that can create high-quality 1080p videos of more than 60 seconds.
- It can perfectly combine text and pictures to generate videos that meet the requirements of both inputs.
- It also supports video editing using text descriptions, including editing of specific areas of the video.
About Google Veo
Veo is a text-to-video generation model developed by Google DeepMind. Veo can generate high-quality 1080p resolution videos over 60 seconds long, covering a variety of movies and visual styles.
As well as creating new videos, it can also edit existing ones, incorporating text-based instructions into them, thus modifying the video according to the user's needs.
The power of Veo lies in its ability to generate videos using both image and text prompts. Users can input text prompts and reference images, and Veo will perfectly blend the visual style of the image with the content of the text prompt to create a stunning video work.
To improve Veo’s ability to understand and accurately execute prompts, Google DeepMind enriched its training data with more detailed video captions.
In addition, the model uses a high-quality compressed video representation method (called latent image), which will help improve efficiency. Together, these measures improve overall video quality and reduce generation time.
Veo's Versatility
Veo uses advanced natural language processing and visual semantic understanding technologies to accurately capture the details and tones in text cues and render intricate details in complex scenes.
It provides creative control and understands cues for various film effects, such as time-lapse, close-ups or aerial shots of landscapes.
Veo’s cutting-edge technology not only generates videos from scratch, but also allows the editing of existing videos, including adding or modifying specific elements within a scene.
In addition, it also supports mask editing, which allows targeted modifications in specific areas of the video. The following example shows how to edit the video according to your needs.
Veo’s advanced latent image diffusion transformer solves the problem of visual consistency and fluidity in generated videos, preventing people, objects and styles from flickering, jumping or distorting between frames, thus improving the overall viewing experience.
It can generate video clips of more than 60 seconds, either from a single prompt or by stitching together a series of prompts that collectively tell a story.
Veo's goal is to democratize video production, empowering experienced filmmakers, content creators, and educators to unlock the potential of storytelling and share knowledge through engaging visuals.
How to use Veo?
Google Veo, like OpenAI’s Sora, isn’t available to the public yet, but it’s available to a limited number of creators in its new experimental tool VideoFX.
If you are interested in Veo, you can apply to join the waiting list. The application link is:https://deepmind.google/technologies/veo/.
When you click the Sign Up button, you will be redirected to a new page. When you click "Sign in with Google", you will be redirected to the Google Labs login page.
You can then fill out a Google form to apply to be added to the waiting list for a chance to try out some of VideoFX's features. However, Veo is currently only available in a few GJs.
at last
As technology continues to evolve, the emergence of Veo and Sora not only represents innovation in the field of video production, but also means the popularization and democratization of creation.
Whether you are a professional producer or an ordinary content creator, you can use such a platform to showcase your creativity, convey your information and share your stories.