Nvidia launches new AI speech recognition model Parakeet, claiming to be better than Whisper

The leading open source conversational AI toolkit NVIDIA NeMo announces the Parakeet ASR model series, a series ofFirstThe most advanced automatic speech recognition (ASR) model in the industry, capable of transcribing spoken English with outstanding accuracy. Developed in partnership with Suno.ai, the Parakeet ASR model is a breakthrough in speech recognition, paving the way for more natural and efficient human-computer interaction.

According to the developers, the models are robust to non-speech clips such as music and silence, and outperformed OpenAI’s Whisper v3. They also provide user-friendly integration into projects via pre-trained control points.

NVIDIA announced four Parakeet models based on RNN Transducer / Connectionist Temporal Classification decoders with 60-110 million parameters. They are able to cope with a variety of audio environments and achieve excellent word error rate (WER) performance on benchmark datasets after training with only 64,000 hours of datasets, outperforming previous models.

Parakeet RNNT1.1B - optimalRecognition accuracy, moderate inference speed. Best used when the most accurate transcription is needed.

Parakeet CTC1.1B - Fast inference speed and strong recognition accuracy. A good balance between accuracy and inference speed.

Parakeet RNNT0.6B - Strong recognition accuracy and fast inference speed. Suitable for large-scale inference with limited resources.

Parakeet CTC0.6B - Fastest with moderate recognition accuracy. Very useful in situations where transcription speed is most important.

The Parakeet model is robust to non-speech segments, including music and silence, effectively preventing the generation of fictitious transcription results. Parakeet is built on the NVIDIA NeMo toolkit, focusing on user-friendliness and flexibility. Pre-trained checkpoints are available for direct use, making it very convenient to integrate the model into your project. Whether looking for immediate reasoning capabilities or fine-tuning for specific tasks, NeMo provides a powerful and intuitive framework to fully realize the potential of the model.

Nvidia launches new AI speech recognition model Parakeet, claiming to be better than Whisper

The main advantages of the Parakeet model include:

- FirstAdvanced Accuracy: Excellent WER performance across a variety of audio sources and domains, and robust to non-speech segments.

- Different model sizes: Two models with 0.6B and 1.1B parameters are provided, which can provide powerful understanding of complex speech patterns.

- Open source and extensible: Built on NVIDIA NeMo, it can be seamlessly integrated and customized.

- Pretrained checkpoints: plug-and-play models that can be used for inference or fine-tuning.

- Permissive License: Released under the CC-BY-4.0 license, model checkpoints can be used in any commercial application.

Parakeet is a major advancement in the development of conversational AI. Its outstanding accuracy, combined with the flexibility and ease of use provided by NeMo, enables developers to create more natural, intuitive voice applications. From improving the accuracy of virtual assistants to enabling seamless real-time communication, the possibilities are endless. The Parakeet family of models has achievedFirstUsers can try parakeet-rnnt-1.1b for themselves and use it in the Gradio demo. To access the model locally and explore the toolkit, visit the NVIDIA NeMo Github page.

Official blog URL: https://nvidia.github.io/NeMo/blogs/2024/2024-01-parakeet/

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Nvidia launches new AI speech recognition model Parakeet, claiming to be better than Whisper

Apple iOS 18 will upgrade the new version of Siri: introduce AI to support natural conversation capabilities

AI assistants face user loss crisis; security concerns may cause startups to lose opportunities

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Apple iOS 18 will upgrade the new version of Siri: introduce AI to support natural conversation capabilities

AI assistants face user loss crisis; security concerns may cause startups to lose opportunities

US government wants 'immediate implementation' of Nvidia AI GPU restrictions on China

NVIDIA's official blog launches the "Decoding AI" column: RTX AI has high computing power, low latency, and local deployment is safer

NVIDIA launches "Generative AI Professional Certification" to help you become an expert in large model development!

Nvidia's market value evaporated by more than $200 billion and shrunk to about $3.1 trillion

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow