Meta releases Sapiens visual model, allowing AI to analyze and understand human actions in images/videos

Meta Reality Labs has recently launched a new Sapiens of AI Vision Model, applicable to four basic human-centric vision tasks: 2D pose estimation, body part segmentation, depth estimation, and surface normal prediction.

The number of parameters of these models varies from 300 million to 2 billion. They adopt the visual transformer architecture, where tasks share the same encoder but each task has a different decoder head.

2D Pose Estimation:This task involves detecting and localizing key points of a human body in a 2D image. These key points usually correspond to joints such as elbows, knees, and shoulders, and help understand a person’s posture and movements.
Body Part Segmentation:This task segments an image into different body parts, such as head, torso, arms, and legs. Each pixel in the image is classified as belonging to a specific body part, which is useful for applications such as virtual try-on and medical imaging.
Depth Estimation:The task is to estimate the distance of each pixel in the image from the camera, effectively generating a 3D image from a 2D image. This is crucial for applications such as augmented reality and autonomous driving, where understanding the layout of a space is important.
Surface Normal Prediction:The task is to predict the orientation of surfaces in an image. Each pixel is assigned a normal vector that indicates which direction the surface is facing. This information is very valuable for 3D reconstruction and understanding the geometry of objects in the scene.

Meta releases Sapiens visual model to enable AI to analyze and understand human actions in images/videos

Meta said the model can natively support 1K high-resolution inference and is very easy to adjust for individual tasks, simply by pre-training the model on more than 300 million wild human images.

Even when labeled data is scarce or entirely synthetic, the generated models can show excellent generalization capabilities to in-the-wild data.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Meta releases Sapiens visual model to enable AI to analyze and understand human actions in images/videos

"The first model in China with voice capabilities comparable to GPT-4o", Lingo voice AI model opens for internal testing

Amazon CEO Andy Jassy: AI assistant Amazon Q can save about 4,500 developers a year

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

"The first model in China with voice capabilities comparable to GPT-4o", Lingo voice AI model opens for internal testing

Amazon CEO Andy Jassy: AI assistant Amazon Q can save about 4,500 developers a year

Meta Ray-Ban smart glasses introduce AI to recognize objects and translate languages

Meta Zuckerberg arrives in South Korea: Reportedly will meet with Samsung Chairman and LG President to discuss extended reality and artificial intelligence cooperation

"The most powerful open source AI model", the 405 billion parameter version of Meta Llama 3 is reported to be released on July 23

The open source multimodal behemoth is here! Meta will launch the Llama 3 405B model on July 23

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow