March 7, 2010 - At the end of last March, theOpenAI announced a program called Voice Engine(A "small preview" of an artificial intelligence service (sound engine).Claims that the technology can clone a person's voice in just 15 seconds of speechThe tool has not been officially launched, however, nearly a year later. Nearly a year later, however, the tool has yet to be officially launched, and OpenAI has not revealed if and when it will go fully live.
OpenAI's cautious approach to the Voice Engine may be rooted in concerns about misuse of the technology, or it may be an attempt to avoid regulatory scrutiny. The company has previously been accused of focusing too much on "flashy products" at the expense of safety, and of rushing to release products before competitors.
In an interview with TechCrunch, an OpenAI spokesperson said thatThe company is still testing Voice Engine with a limited number of "trusted partners".. The spokesperson said, "We are learning from our partners' use of the technology to improve the utility and safety of the model. We are excited to see it being used in a variety of scenarios, including speech therapy, language learning, customer support, gaming characters, and AI avatars."
According to IT Home, Voice Engine is the technology behind the OpenAI Text-to-Speech API and ChatGPT speech patterns, and is capable of generating natural speech that closely resembles the original speaker. The tool converts written text into speech while setting certain restrictions on content.From the beginning, however, the release of Voice Engine has been plagued by delays and constant changes to the release window.
According to a June 2024 OpenAI blog post, the Voice Engine model learns to predict how a speaker is likely to sound in a given text transcription, taking into account different voices, accents, and speaking styles, in order to generate phonetic versions of the text, as well as "phonetic expressions" that reflect the text as read by different types of speakers. ".
Initially, OpenAI planned to introduce Voice Engine (then called Custom Voices) to its API on March 7, 2024, and planned to make it available to up to 100 "trusted developers" in advance.Priority is given to developers of applications that have "social value" or demonstrate "innovative and responsible" use of technologyOpenAI also set prices for the service: $15 per million characters for "standard" speech and $30 per million characters for "HD" speech. However, at the last minute, the company delayed the launch. A few weeks later, OpenAI released Voice Engine without a registration option, allowing access to only about 10 developers who began working together in late 2023.
In March 2024, OpenAI stated in the Voice Engine's launch blog, "We want to start conversations about the responsible deployment of synthetic speech and how society can adapt to these new capabilities. Based on the results of these conversations and small-scale testing, we will make more informed decisions about whether and how to deploy this technology at scale."
According to OpenAI, Voice Engine has been in development since 2022. The company claims to have demonstrated the tool's potential and risks to the world's most senior policymakers in the summer of 2023. Voice Engine is currently available to several partners, including Livox, a startup working to develop more natural communication devices for people with disabilities. Its CEO, Carlos Pereira, told TechCrunch that while they were unable to integrate Voice Engine into their products due to its network requirements (many of Livox's customers don't have Internet access), he found the technology "impressive. "
Speaking to TechCrunch via email, Pereira said, "The quality of the voice and the possibility for the voice to be able to speak in different languages is unique - especially for our customers, the disabled. This is the most impressive and easy-to-use tool for creating speech I've ever seen. We hope OpenAI develops an offline version soon." He added that he has not yet received any guidance from OpenAI about the possible release of Voice Engine, nor has he seen any indication that the company plans to start charging for it. Currently, Livox has not incurred any fees for its use.
In a June 2024 blog post.OpenAI hinted that one of the reasons for the delay in releasing the Voice Engine was concern that the technology could be misused during last year's US election cycle. Based on discussions with stakeholders, Voice Engine has taken several security measures, including watermarking the generated audio to trace its origin.
According to OpenAI.Developers must obtain "explicit consent" from the original speaker before using the Voice Engine.The company has also stated that it has a policy of "explicitly disclosing" to its audience that the voice is generated by AI. However, the company has not yet explained how it will enforce these policies. Even for companies with OpenAI resources, enforcing these policies at scale could be challenging.
In the blog post, OpenAI also hinted at a desire to develop a "voice authentication experience" to verify the identity of the speaker and to create a "banned list".Prevents the creation of voices that sound too similar to those of well-known personalities. Both projects are extremely ambitious technologically and, if not handled correctly, will have a negative impact on a company often accused of neglecting security initiatives.
With AI Voice cloningtechnology is rapidly evolving, effective filtering and authentication are fast becoming essential requirements for the responsible release of voice cloning technology. According to a related report.AI voice cloning is the third fastest growing scam by 2024that has led to an increase in fraud and the bypassing of bank security checks, while privacy and copyright laws have struggled to keep pace. Malicious actors use voice cloning to create inflammatory in-depth fake videos of celebrities and politicians that spread rapidly on social media.
OpenAI may launch Voice Engine next week, or it may never launch. The company has repeatedly said it is considering keeping the service on a smaller scale. But one thing is clear: The limited preview of Voice Engine has become the longest in OpenAI's history, both for image and security reasons.