Wuhan University and China Mobile's Jiutian AI team jointly open-sourced the audio and video speaker recognition dataset VoxBlink2

Wuhan UniversityjointChina MobileThe Jiutian AI team and Duke Kunshan University used YouTube data toOpen SourceMore than 110,000 hours of audio and video speaker recognitionDataset VoxBlink2The dataset contains 9,904,382 high-quality audio clips and their corresponding video clips from 111,284 users on YouTube. It is currently the largest publicly available audio and video speaker recognition dataset. The release of the dataset aims to enrich the open source speech corpus and support the training of large voiceprint models.

The VoxBlink2 dataset is mined through the following steps:

Candidate preparation: Collect multilingual keyword lists, retrieve user videos, and select the first minute of video for processing.

Face Extraction & Detection: Extract video frames at a high frame rate and use MobileNet to detect faces, ensuring that the video track contains only a single speaker.

Face recognition: Pre-trained face recognizer recognizes each frame to ensure that the audio and video clips are from the same person.

Active speaker detection: Using lip movement sequences and audio, a multimodal active speaker detector outputs the utterance segment, and aliasing detection removes multi-speaker segments.

In order to improve the data accuracy, a bypass step of the in-set face recognizer was also introduced. Through rough face extraction, face verification, face sampling and training, the accuracy was improved from 72% to 92%.

VoxBlink2 also open-sources voiceprint models of different sizes, including a 2D convolutional model based on ResNet, a time series model based on ECAPA-TDNN, and a super-large model ResNet293 based on Simple Attention Module. After post-processing on the Vox1-O dataset, these models can achieve an EER of 0.17% and a minDCF of 0.006%.

Dataset website:https://VoxBlink2.github.io

How to download the dataset:https://github.com/VoxBlink2/ScriptsForVoxBlink2

Meta files and models:https://drive.google.com/drive/folders/1lzumPsnl5yEaMP9g2bFbSKINLZ-QRJVP

Paper address:https://arxiv.org/abs/2407.11510

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Wuhan University and China Mobile's Jiutian AI team jointly open-sourced the audio and video speaker recognition dataset VoxBlink2

Google Gemini major update: multi-language support, performance improvement, open to teenagers

Stability AI releases Stable Video 4D, a generative model for converting a single video into multiple views

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Google Gemini major update: multi-language support, performance improvement, open to teenagers

Stability AI releases Stable Video 4D, a generative model for converting a single video into multiple views

The world's largest Oracle "dataset" is open source

Hugging Face, the world's largest open source AI community, will provide $10 million in shared GPUs for free to help small businesses compete with large companies

Stable Diffusion3 open source commercial protocol, will open source a larger version of the model

Alibaba Cloud Tongyi Qianwen open-sources two voice base models, with better recognition performance than OpenAI Whisper

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow