Google releases Gemini Live: supports AI voice chat, simulates interview scenarios, and recommends presentation skills

GoogleAt today's Pixel 9 seriesMobile phone launchsuperior,Published Gemini Live The service, which will be available first to English-speaking Gemini Advanced subscribers, begins today.

Promoting natural and fluid dialogic exchanges

Google says Gemini Live provides a mobile conversational experience that lets users have free-flowing conversations with Gemini.

Gemini Live can be said to be the counterpart to OpenAI ChatGPT's newly launched Advanced Voice mode (limited alpha testing), which employs an enhanced voice engine to enable more coherent, emotionally expressive and realistic multi-round conversations.

Google releases Gemini Live: supports AI voice chat, simulates interview scenarios, and recommends presentation skills

Google says users can interrupt the chatbot while it's talking to ask follow-up questions, and the chatbot will adapt to the user's speech patterns in real time.

Translate part of the Google blog post below:

  • With Gemini Live [using the Gemini app], the user can talk to the Gemini and choose from [10 new] natural sounds that it can respond to.
  • Users can even speak at their own pace, or interrupt mid-answer and ask clarifying questions, just as they would in a human conversation.

Google demos a scenario from Gemini Live that simulates a user talking to a hiring manager (or AI, as the case may be) to provide recommendations on presentation skills and offer optimization advice.

A Google spokesperson said:

  • Live uses our Gemini Advanced model, which we've tweaked to make it more conversational. The model's large context window is used when users are having long conversations with Live.

Multimodal inputs are not supported

Gemini Live doesn't yet have one of the features Google showed off at I / O: multimodal input.

Google released a prerecorded video this past May showing Gemini Live seeing and reacting to a user's surroundings through photos and videos captured by a phone's camera, such as naming parts on a broken bike or explaining what part of the code on a computer screen does.

Google said multimodal input will be available "later this year," but declined to give specifics.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Google releases Pixel Studio local AI image processing application: based on Imagen 3 model, generated in 2 seconds

2024-8-14 9:33:46

Information

Kuaishou launches the "Spaceship" App: Based on the "Kuaiyi" AI model, it focuses on virtual companionship

2024-8-14 9:37:12

Search