OpenAI to launch multimodal AI digital assistant: can talk by voice, recognize objects, sources say

According to The Information,OpenAI A new multimodal AI model that is capable of voice conversations and object recognition was recently demonstrated to some customers. Sources tell us that this may be one of the official releases OpenAI plans to make this coming May 13th.

OpenAI to launch multimodal AI digital assistant: can talk by voice, recognize objects, sources say

Image source: Pexels

According to the report, the new model can process image and audio information faster and more accurately than OpenAI's existing standalone image recognition and text-to-speech models. For example, it could help customer service agents "better understand a caller's tone of voice and determine if they are using a sarcastic tone." Theoretically, the model could also assist students in learning math or translating real-world sign language.

However, the source also noted that while the model was able to outperform the GPT-4 Turbo in terms of answering certain questions, there is still the possibility of confidently giving the wrong answer.

OpenAI to launch multimodal AI digital assistant: can talk by voice, recognize objects, sources say

Developer Ananay Arora posted a screenshot containing code related to calls, suggesting that OpenAI may be adding the ability to make phone calls to ChatGPT. Arora also found some evidence that OpenAI is configuring servers for real-time audio and video communication.

OpenAI CEO Sam Altman has categorically denied that the upcoming release is a large-scale language model code-named GPT-5 (which is said to be significantly better than GPT-4), and The Information says that GPT-5 could be officially unveiled before the end of the year. Altman also said that OpenAI will not release a new AI search engine.

If The Information's report is true, OpenAI's new release could still have some impact on the upcoming Google I / O developer conference. Google is also known to be testing technology that utilizes AI to make phone calls. Additionally, Google has a rumored upcoming project codenamed "Pixie," a multimodal Google Assistant replacement that recognizes objects through the device's camera, providing users with information such as "how to get to the place of purchase" or "how to get to the place of purchase" or "how to get to the place of purchase". "or how to use it.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Security company warns hackers are targeting user accounts of major AI language model platforms to resell API balances/obtain private information

2024-5-13 9:09:44

Information

The California government is testing ChatGPT and other generative applications in four departments to improve taxation and transportation problems

2024-5-13 9:11:49

Search