ZhipuAI recently announced a new generationMultimodal large modelCogVLM2, the model has significantly improved key performance indicators compared to the previous generation CogVLM, while supporting 8K text length and images with a resolution of up to 1344*1344. CogVLM2 has improved its performance by 32% on the OCRbench benchmark and 21.9% on the TextVQA benchmark, showing strong document image understanding capabilities. Although the model size of CogVLM2 is 19B, its performance is close to or exceeds the level of GPT-4V.
The technical architecture of CogVLM2 is optimized based on the previous generation model, including a 5-billion-parameter visual encoder and a 7-billion-parameter visual expert module, which finely models the interaction between visual and language sequences through unique parameter settings. This deep fusion strategy enables a closer integration of the visual modality and the language modality while maintaining the model's advantages in language processing. In addition, the number of parameters actually activated by CogVLM2 during reasoning is only about 12 billion, thanks to its carefully designed multi-expert module structure, which significantly improves reasoning efficiency.
In terms of model performance, CogVLM2 has achieved excellent results in multiple multimodal benchmarks, including TextVQA, DocVQA, ChartQA, OCRbench, MMMU, MMVet, and MMBench. These tests cover a wide range of capabilities from text and image understanding to complex reasoning and interdisciplinary tasks. The two models of CogVLM2 have achieved excellent results in multiple benchmarks.FirstIt has advanced performance, while other performance can reach a level close to that of closed-source models.
Code repository:
Github:https://github.com/THUDM/CogVLM2
Model Download:
Huggingface:huggingface.co/THUDM
Moda Community: modelscope.cn/models/ZhipuAI
ZhiuAI Community: wisemodel.cn/models/ZhipuAI
Demo experience:
https://modelscope.cn/studios/ZhipuAI/Cogvlm2-llama3-chinese-chat-Demo/summary
CogVLM2 Technical Documentation:
https://zhipu-ai.feishu.cn/wiki/OQJ9wk5dYiqk93kp3SKcBGDPnGf