Feb. 26, 2012 - Early this morning, Beijing time.MicrosoftIn the official websiteOpen SourceBeMultimodality AI Agent Base model --MagmaMagma has a lot more to offer than a traditional Agent. Compared to traditional Agents, Magma hasMultimodal capabilities across digital, physical worldsIn addition to automatically processing different types of data such as images, video, and text, Magma has built-in psychological prediction capabilities that enhance the ability to understand the spatial and temporal dynamics of future video frames and accurately predict the intentions and future behavior of people or objects in the video.
Users can use Magma toAutomatically place e-commerce orders and check the weather; it can alsoAutomatically operated physical robots, or get help in playing real chess.
According to the official description, Magma is able to help AI-driven assistants or robots understand their surroundings and act accordingly. For example, it can help domestic robotsLearn how to organize items you've never seen before, or help virtual assistantsGenerate step-by-step user interface navigation instructions for unfamiliar tasks.
Magma is one of the foundational models of VLA (IT House Note: Visual Linguistic Action) capable of adapting to new tasks in digital and physical environments, effectively learning from massive amounts of publicly available visual and linguistic data to fuse linguistic, spatial, and temporal intelligences to cope with complex tasks and environments in the digital and physical world.
With open source link: https://microsoft.github.io/Magma/