Apple launches MM1 multimodal AI model with 30 billion parameters, capable of recognizing images, reasoning about natural language

appleThe company's research team recently ArXiv published an article titledMM1: Methods, Analysis & Insights from Multimodal LLM Pre-training", which introduces a "MM1"Multimodal large modelThe model provides three parameter sizes: 3 billion, 7 billion, and 30 billion.Possess image recognition and natural language reasoning capabilities.

Apple launches MM1 multimodal AI model with 30 billion parameters, capable of recognizing images and reasoning about natural language

The relevant papers of the Apple research team mainly use the MM1 model for experiments, by controlling various variables, find out the key factors that affect the model effect.

Research shows thatImage resolution and the number of image tags have a greater impact on model performance, while the visual language connector has a smaller impact on the model. Different types of pre-training data have different effects on model performance..

According to reports, the research team first conducted small-scale ablation experiments on model architecture decisions and pre-training data. Then they built the MM1 model using the Mixture of Experts architecture and a method called Top-2 Gating, which claims to have achieved the best performance in pre-training indicators and maintained competitive performance after supervised fine-tuning on a series of existing multimodal benchmarks.

The researchers tested the "MM1" model.The MM1-3B-Chat and MM1-7B-Chat are said to be superior to most models of the same size on the market.MM1-3B-Chat and MM1-7B-Chat performed particularly well in VQAv2, TextVQA, ScienceQA, MMBench, MMMU, and MathVista, but their overall performance was inferior to Google's Gemini and OpenAI's GPT-4V.

Apple launches MM1 multimodal AI model with 30 billion parameters, capable of recognizing images and reasoning about natural language

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Apple launches MM1 multimodal AI model with 30 billion parameters, capable of recognizing images and reasoning about natural language

Microsoft "backstabs" Google: It has two natural resource advantages in training AI: search engines and YouTube

India shelved plans to approve AI models after backlash from entrepreneurs, investors

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

Microsoft "backstabs" Google: It has two natural resource advantages in training AI: search engines and YouTube

India shelved plans to approve AI models after backlash from entrepreneurs, investors

Apple in talks with news publishers to develop generative AI systems using their content

Cook confirms that Apple will launch new AI features to subvert the smartphone experience

Zhipu open-sources the next-generation multimodal large model CogVLM2

Apple AI servers will reportedly use "confidential computing" technology to process data and protect user privacy

Please enter the code

... .Payment confirmation in progress....

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow