智谱开源新一代多模态大模型CogVLM2

ZhipuAI recently announced a new generationMultimodal large model CogVLM2, the model has significantly improved key performance indicators compared to the previous generation CogVLM, while supporting 8K text length and images with a resolution of up to 1344*1344. CogVLM2 has improved its performance by 32% on the OCRbench benchmark and 21.9% on the TextVQA benchmark, showing strong document image understanding capabilities. Although the model size of CogVLM2 is 19B, its performance is close to or exceeds the level of GPT-4V.

Zhipu open-sources the next-generation multimodal large model CogVLM2

The technical architecture of CogVLM2 is optimized based on the previous generation model, including a 5-billion-parameter visual encoder and a 7-billion-parameter visual expert module, which finely models the interaction between visual and language sequences through unique parameter settings. This deep fusion strategy enables a closer integration of the visual modality and the language modality while maintaining the model's advantages in language processing. In addition, the number of parameters actually activated by CogVLM2 during reasoning is only about 12 billion, thanks to its carefully designed multi-expert module structure, which significantly improves reasoning efficiency.

In terms of model performance, CogVLM2 has achieved excellent results in multiple multimodal benchmarks, including TextVQA, DocVQA, ChartQA, OCRbench, MMMU, MMVet, and MMBench. These tests cover a wide range of capabilities from text and image understanding to complex reasoning and interdisciplinary tasks. The two models of CogVLM2 have achieved excellent results in multiple benchmarks.FirstIt has advanced performance, while other performance can reach a level close to that of closed-source models.

Code repository:

Github:https://github.com/THUDM/CogVLM2

Model Download:

Huggingface:huggingface.co/THUDM

Moda Community: modelscope.cn/models/ZhipuAI

ZhiuAI Community: wisemodel.cn/models/ZhipuAI

Demo experience:

https://modelscope.cn/studios/ZhipuAI/Cogvlm2-llama3-chinese-chat-Demo/summary

CogVLM2 Technical Documentation:

https://zhipu-ai.feishu.cn/wiki/OQJ9wk5dYiqk93kp3SKcBGDPnGf

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Zhipu open-sources the next-generation multimodal large model CogVLM2

Claiming to be "comparable to human experts", Google Gemini 1.5 Pro Mathematics Edition "improves intelligence": MATH benchmark accuracy rate is 91.1%

Tencent plans to invest in Dark Side of the Moon, with a valuation of $3 billion

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Claiming to be "comparable to human experts", Google Gemini 1.5 Pro Mathematics Edition "improves intelligence": MATH benchmark accuracy rate is 91.1%

Tencent plans to invest in Dark Side of the Moon, with a valuation of $3 billion

Huazhong University of Science and Technology open-sources multimodal large model Monkey

Small parameters, strong performance! Open source multimodal model - TinyGPT-V

Hugging Face, the world's largest open source AI community, will provide $10 million in shared GPUs for free to help small businesses compete with large companies

Russian tech giant Yandex announces open source "YaFSDP" large language model training tool: greatly improves GPU utilization, and can achieve 26% acceleration for Llama 3

Please enter the code

....Payment confirmation in progress....

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow