Google Releases Multimodal Live Streaming API: Unlocking Watching, Listening, and Speaking, Opening a New Experience in AI Audio and Video Interaction

GoogleAlong with the release of Gemini 2.0 yesterday, the newMultimodalityLive streaming (Multimodal Live)API,Helps developers create applications with real-time audio and video streaming capabilities.

The API enables low-latency, bi-directional text, audio, and video interactions with audio and text output for a more natural, smooth, human-like dialog experience. Users can interrupt the model at any time and interact with it via shared camera input or screen recording to ask questions about the content.

The model's video comprehension capabilities extend the communication paradigm by enabling users to use the camera to take or share a desktop in real time and ask relevant questions. The API has been made available to developers and a demo application of the multimodal real-time assistant is also available to users.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Generative AI's Copyright Dilemma: New Clues Suggest OpenAI Uses Game Content to Train Sora Video Generation Models

2024-12-13 9:00:24

Information

Harvard, Google release 1 million public domain books to provide legitimate data for AI training

2024-12-13 17:19:25

Search