Zhipu AI open-sources video understanding model CogVLM2-Video, which can answer time-related questions

Zhipu AI Announced that a new video understanding model has been trained CogVLM2-Video, andOpen Source.

It is reported that most current video understanding models use frame averaging and video tag compression methods, which results in the loss of temporal information and cannot accurately answer time-related questions. Some models that focus on temporal question-answering datasets are too limited to specific formats and applicable fields, causing the models to lose broader question-answering capabilities.

Zhipu AI open-sources video understanding model CogVLM2-Video, which can answer time-related questions

▲ Official effect demonstration

Zhipu AI proposed aAutomatic time positioning data construction method based on visual model, generating 30,000 time-related video question-answering data. Based on this new dataset and existing open-domain question-answering data, we introduced multi-frame video images and timestamps as encoder inputs and trained the CogVLM2-Video model.

Zhipu AI said that CogVLM2-Video not only achieved state-of-the-art performance on public video understanding benchmarks, but also excelled in video subtitle generation and temporal positioning.

Attached related links:

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

China Telecom launches the "Xingchen Smart Answer" service, the world's first AI large-scale model that can be used by texting

2024-7-13 12:08:47

Information

The results of the first "Miss AI" beauty pageant are out, but the controversy behind it is far from over

2024-7-13 12:10:54

Search