Groq launches whisper-large-v3 model, supports voice transcription and translation, free and open

Groq The latest Whisper Large-V3 model has been launched, allowing users to use the API in Playground or local projects to realize voice transcription and translation functions. The model supports multi-language transcription, very fast transcription speed, and supports translation from other languages to English.

Groq launches whisper-large-v3 model, supports voice transcription and translation, free and open

Playground Links.https://console.groq.com/playground

Currently, users can experience and use this feature for free on Playground, and it only takes about 3 seconds to transcribe a 4 minute 30 second video. At the same time, Groq also provides an API interface for users to integrate it in their local projects.

The interface design of the Whisper API follows compatibility standards with OpenAI, providing users with access paths to two core functions: speech-to-text and speech translation. Users can easily integrate these functions into their own applications and enjoy a convenient development experience, whether they are developing an intelligent assistant or an automated translation system.

In terms of performanceThe Whisper API utilizes an advanced "whisper-large-v3" model that ensures top performance in speech-to-text and translation tasks.

In addition, the API has clear support standards for audio file formats and sizes, including mp3, mp4, wav, and other common formats, but requires that the file size does not exceed 25MB. of particular note is that for files containing multiple audio tracks, the Whisper API will only process the first track, which requires that the user performs the proper audio preprocessing before uploading.

In order to improve the quality and efficiency of the transcription, Whisper API downsamples the audio on the server side to a mono 16,000 Hz. Groq recommends that users complete this pre-processing step on the client side, which not only helps to reduce the file size, but also allows for longer audio files to be uploaded and processed.

API Interface.

Speech to text:https://api.groq.com/openai/v1/audio/transcriptions

Voice translation:https://api.groq.com/openai/v1/audio/translations

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Tencent Yuanbao AI search and analysis capabilities upgraded to support processing of extremely long texts with tens of millions of words

2024-6-21 9:52:51

Information

German research team releases new AI model that can identify emotions based on tennis players' body language

2024-6-22 9:26:41

Search