Google Releases Multimodal Live Streaming API: Unlocking Watching, Listening, and Speaking, Opening a New Experience in AI Audio and Video Interaction

GoogleAlong with the release of Gemini 2.0 yesterday, the newMultimodalityLive streaming (Multimodal Live)API,Helps developers create applications with real-time audio and video streaming capabilities.

Google Releases Multimodal Live Streaming API: Unlocking Watching, Listening, and Speaking, Opening a New Experience in AI Audio and Video Interaction

The API enables low-latency, bi-directional text, audio, and video interactions with audio and text output for a more natural, smooth, human-like dialog experience. Users can interrupt the model at any time and interact with it via shared camera input or screen recording to ask questions about the content.

The model's video comprehension capabilities extend the communication paradigm by enabling users to use the camera to take or share a desktop in real time and ask relevant questions. The API has been made available to developers and a demo application of the multimodal real-time assistant is also available to users.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Generative AI's Copyright Dilemma: New Clues Suggest OpenAI Uses Game Content to Train Sora Video Generation Models

2024-12-13 9:00:24

Information

Harvard, Google release 1 million public domain books to provide legitimate data for AI training

2024-12-13 17:19:25

Search