GoogleAt today's Pixel 9 seriesMobile phone launchsuperior,Published Gemini Live The service, which will be available first to English-speaking Gemini Advanced subscribers, begins today.
Promoting natural and fluid dialogic exchanges
Google says Gemini Live provides a mobile conversational experience that lets users have free-flowing conversations with Gemini.
Gemini Live can be said to be the counterpart to OpenAI ChatGPT's newly launched Advanced Voice mode (limited alpha testing), which employs an enhanced voice engine to enable more coherent, emotionally expressive and realistic multi-round conversations.
Google says users can interrupt the chatbot while it's talking to ask follow-up questions, and the chatbot will adapt to the user's speech patterns in real time.
Translate part of the Google blog post below:
- With Gemini Live [using the Gemini app], the user can talk to the Gemini and choose from [10 new] natural sounds that it can respond to.
- Users can even speak at their own pace, or interrupt mid-answer and ask clarifying questions, just as they would in a human conversation.
Google demos a scenario from Gemini Live that simulates a user talking to a hiring manager (or AI, as the case may be) to provide recommendations on presentation skills and offer optimization advice.
A Google spokesperson said:
- Live uses our Gemini Advanced model, which we've tweaked to make it more conversational. The model's large context window is used when users are having long conversations with Live.
Multimodal inputs are not supported
Gemini Live doesn't yet have one of the features Google showed off at I / O: multimodal input.
Google released a prerecorded video this past May showing Gemini Live seeing and reacting to a user's surroundings through photos and videos captured by a phone's camera, such as naming parts on a broken bike or explaining what part of the code on a computer screen does.
Google said multimodal input will be available "later this year," but declined to give specifics.