MyShellup to dateLaunchedOpen SourceVoice Cloning ToolOpenVoiceIt has attracted a lot of attention. This innovative product was developed in collaboration with the Massachusetts Institute of Technology (MIT), Tsinghua University, and Canadian AI startup MyShell.OpenVoice employs a conceptually simple but highly efficient approach to cloning a user's voice almost instantly and using significantly fewer computing resources.
The tool not only has the basic functionality of a voice clone, but also offers nuanced control options covering a wide range of aspects such as intonation, emotion, rhythm, pauses and intonation. This means that users can generate personalized voice clones with OpenVoice without spending a lot of time and computing resources.
In the authors' non-scientific tests, a relatively convincing speech clone was generated using OpenVoice on the HuggingFace platform, which was accomplished with only a few seconds of random speech. Unlike other voice cloning applications, the user does not need to read aloud a specific text fragment, but simply speaks random words for a few seconds to immediately generate a playable voice clone that reads the provided text prompts.
OpenVoice is backed by two main AI models: a text-to-speech (TTS) model and a "pitch shifter" model. the TTS model controls "stylistic parameters and language" and is trained with 30,000 sentence samples from two English speakers (US and UK accents), a Chinese speaker and a Japanese speaker. The TTS model controls "stylistic parameters and language" and is trained with 30,000 sentence audio samples from two English speakers (American and British accents), one Chinese speaker and one Japanese speaker. Meanwhile, the pitch shifter model was trained on 300,000 audio samples from more than 20,000 different speakers.
By combining the pitch of user-supplied recorded audio with the "base vocalizer" of the TTS model, the two models together can replicate the user's voice and change its "pitch color" or the emotional expression expressed by the text. OpenVoice's approach significantly reduces the use of computational resources when cloning speech compared to other approaches, including Meta's competitor Voicebox.
MyShell, founded in 2023 as an enabler of OpenVoice and headquartered in Calgary, Alberta, Canada, has attracted more than 400,000 users through a $5.6 million seed round led by INCE Capital and additional investments from Folius Ventures, Hashkey Capital, SevenX Ventures, TSVC and additional investment from OP Crypto, has attracted over 400,000 users. The startup offers a variety of text-based AI characters and bots through its web application, including a number of characters with different "personalities," as well as an animated GIF creation tool and user-generated text-based role-playing games.
Although MyShell open-sources OpenVoice, the company still earns revenue from a variety of sources such as monthly subscriptions to its web app, fees from third-party bot creators who wish to promote their products within the app, and fees for AI training data. This business model is designed to provide MyShell with a sustainable economic base, thus finding a balance between open source and commercial interests.