-
Meta releases Sapiens visual model to enable AI to analyze and understand human actions in images/videos
Meta Reality Labs has recently launched an AI vision model called Sapiens, which is suitable for four basic human-centric vision tasks: 2D pose estimation, body part segmentation, depth estimation, and surface normal prediction. The number of parameters of these models varies, ranging from 300 million to 2 billion. They use a visual converter architecture, where tasks share the same encoder, while each task has a different decoder head. 2D pose estimation: This task involves detecting and locating key points of the human body in a 2D image. These key points are usually associated with joints such as elbows, knees, and shoulders…- 11.4k