Apple launches all-in-one vision model 4M-21 that can handle 21 different modalities

Apple launches all-around visual model 4M-21 that can handle 21 different modalities

appleand researchers at the École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland have jointly developed a single model for any-to-any modality that can be trained on dozens of highly diverse modalities and co-trained on large-scale multimodal datasets and text corpora. The model, named 4M-21, is trained on 21 different modalities and accomplishes at least three times more than existing models without loss of performance.

Apple launches all-around visual model 4M-21 that can handle 21 different modalities

The study used a 4M pre-training scheme, which can improve the performance and adaptability of the model by scaling up the size of the model and dataset, increasing the type and number of modalities involved in training the model, and co-training on multiple datasets. The researchers used different tokenization methods to discretize modalities with different features, such as global image embedding, human gestures, and semantic instances. In terms of architecture selection, the study uses a Transformer-based 4M encoder-decoder architecture and adds additional modal embeddings to accommodate new modalities.

Apple launches all-around visual model 4M-21 that can handle 21 different modalities

The model not only performs a range of common vision tasks out-of-the-box, such as DIODE surface normal and depth estimation, COCO semantic and instance segmentation, and 3DPW3D human pose estimation, but is also capable of generating arbitrary training modalities, supports several methods to perform fine-grained and multimodal generation, and can retrieve RGB images or other modalities by using other modalities as queries. In addition, the researchers have conducted multimodal transfer experiments on NYUv2, Hypersim semantic segmentation, and ARKitScenes.

Important functional features include.

Arbitrary to Arbitrary Modality: Increase the number of modalities from the existing best 7 modalities for arbitrary to arbitrary models to 21 different modalities for cross-modality retrieval, controlled generation and powerful out-of-the-box performance.

Versatility support: Add support for more structured data such as human posture, SAM instances, metadata, etc.

Tokenization:Investigates discrete tokenization for different modalities, such as global image embedding, human gestures, and semantic instances, using modality-specific approaches.

Extension:Extend the model size to 3B parameters and the dataset to 0.5B samples.

Co-training: Simultaneous visual and verbal co-training.

Paper address:https://arxiv.org/pdf/2406.09406

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.

{{userData.name}}Verify

Apple launches all-around visual model 4M-21 that can handle 21 different modalities

Kunlun Wanwei Tiangong Open Platform launches "One-click Moving Plan" for OpenAl API users

LG Uplus officially releases ixi-GEN, a small generative AI model that can be fine-tuned locally

AI Weibo

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

{{userData.name}}Verify

Related content:

Kunlun Wanwei Tiangong Open Platform launches "One-click Moving Plan" for OpenAl API users

LG Uplus officially releases ixi-GEN, a small generative AI model that can be fine-tuned locally

Apple in talks with news publishers to develop generative AI systems using their content

Apple may launch AI app store, expected to be released at WWDC in June

Apple WWDC2024 scheduled: held on June 10, iOS 18 generative AI will be released

iPhone AI upgrade: Apple is reportedly close to reaching an agreement with OpenAI, and iOS 18 will use ChatGPT

AI Applications

5000+ AI applications! Updated daily

AIAICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow