Google DeepMind CEO Demis Hassabis revealed on Possible, a podcast co-hosted by Collage co-founder Reid Hoffman, that Google plans to bring its Gemini AI models with Veo Video generation models are fused to enhance Gemini's understanding of the physical world.
We've built Gemini, the base model, as a multimodal model from the beginning," Hassabis said.Because we have a vision of building a universal digital assistant that can actually help you in the real world. "
at present,The entire AI industry is gradually moving towards "all-purpose" models.These models are capable of understanding and integrating multiple media forms. Google's latest Gemini model can generate not only images and text, but also audio, while OpenAI's default model in ChatGPT can now create images, including Hayao Miyazaki-style artwork. Amazon has also announced plans to release an "any-to-any" model later this year.
According to 1AI, these "omnipotent" models require a lot of training data, including images, video, audio, text, etc. Hassabis hinted that Veo's video data comes mainly from Google's YouTube platform. Hassabis hinted that Veo's video data comes primarily from Google's YouTube platform, saying, "By watching tons of YouTube videos, Veo 2 is able to understand the physical laws of the world." Previously, Google told TechCrunch that its model may be trained using "some" YouTube content under a deal with YouTube creators. The company reportedly expanded parts of its terms of service last year to allow for more data to train its AI models.