GoogleAlong with the release of Gemini 2.0 yesterday, the newMultimodalityLive streaming (Multimodal Live)API,Helps developers create applications with real-time audio and video streaming capabilities.
The API enables low-latency, bi-directional text, audio, and video interactions with audio and text output for a more natural, smooth, human-like dialog experience. Users can interrupt the model at any time and interact with it via shared camera input or screen recording to ask questions about the content.
The model's video comprehension capabilities extend the communication paradigm by enabling users to use the camera to take or share a desktop in real time and ask relevant questions. The API has been made available to developers and a demo application of the multimodal real-time assistant is also available to users.