Google Veo Header 1
Images Taken From Videos Generated By Google Veo.
To enlarge / Images taken from videos generated by Google Veo.

Google / Benj Edwards

Google announced at Google I/O 2024 on Tuesday i see, OpenAI’s new AI video synthesis model that can generate HD videos from text, image or video prompts similar to Sora. It can create 1080p videos lasting more than a minute and edit videos according to written instructions, but it has not yet been released for widespread use.

Veo reportedly includes the ability to edit existing videos using text commands, maintain visual continuity between frames, and create video sequences lasting up to 60 seconds or more from a single prompt or a series of prompts that form a story. The company said it can create detailed scenes and apply cinematic effects such as time-lapses, aerial shots and various visual styles.

Since the launch of DALL-E 2 in April 2022, we’ve seen a parade of new image synthesis and video synthesis models that allow anyone who can write a written description to create detailed images or videos. While neither technology is perfect, both AI image and video generators are steadily becoming more capable.

Back in February, we covered a preview of OpenAI’s Sora video generator, which many believed at the time represented the best AI video synthesis the industry had to offer. Tyler impressed Perry enough that he stopped expanding his movie studio. However, so far OpenAI has not provided general access to the tool – instead, they have limited its use to a select group of testers.

Now, Google’s Veo has Sora-like video creation capabilities at first glance. We haven’t tried it ourselves, so we can only go by the cherry-picked demonstration videos provided by the company. on their website. This means that anyone looking at them should take Google’s claims with a large grain of salt, as generational results may not be typical.

Veo’s example videos include a cowboy riding a horse, a fast-track shot down a suburban street, kebabs on the grill, time-lapse of a sunflower opening, and more. includes. Historically, AI image and video models have been noticeably lacking in human-detailed images, which are difficult to generate without obvious distortions.

Google says Veo builds on the company’s previous video generation models, including Generative Query Network (GQN), DVD-GAN, Imagen-Video, fenaki, WALT, VideoPoet and Lumiere. To improve quality and efficiency, Veo’s training data includes more detailed video captions, and it uses compressed “hidden” video presentations. To improve Veo’s video generation quality, Google included more detailed captions for the videos used to train Veo, allowing the AI ​​to more accurately interpret instructions.

Veo is also notable for its support for movie capture commands: “When given both an input video and an edit command, such as adding kayaks to an aerial shot of a coastline, Veo can apply that command to the original video and create a new, edited video,” the company said. he says.

While the demonstrations may seem impressive at first glance (especially compared to Will Smith eating spaghetti), Google admits that creating AI videos is difficult. “Maintaining visual consistency can be challenging for video generation models,” the company writes. “Characters, objects, or even entire scenes can disrupt the viewing experience by shaking, jumping, or changing unexpectedly between frames.”

Google tried to mitigate these drawbacks with “advanced latent diffusion transformers,” which is basically meaningless marketing talk without specifics. But the company is quite confident about this model working with actor Donald Glover and his studio, Gilga, to create a soon-to-debut AI-generated demo.

Initially, Veo will be available to select creators VideoFX, a new experimental tool is available on Google’s AI Test Kitchen site, labs.google. Creators can join VideoFX’s waiting list to potentially gain access to Veo features in the coming weeks. Google plans to integrate some of Veo’s capabilities into YouTube Shorts and other products in the future.

No word yet on where Google got the training data for Veo (if we had to guess, YouTube was probably involved). But Google says it’s taking a “responsible” approach with Veo. According to the company, “Videos created by Veo are watermarked using SynthIDis our state-of-the-art tool for watermarking and identifying AI-generated content and has gone through security filters and retention verification processes that help reduce privacy, copyright and bias risks.”