DeepSeek and other AI intelligences that control computers in seconds, Microsoft's open-source tool OmniParser V2.0 is released.

February 17th.Microsoft OmniParser It is an AI tool for parsing and recognizing on-screen interactive icons by purely visual GUI-based intelligences, previously paired with GPT-4V to significantly enhance recognition capabilities.

DeepSeek and other AI intelligences that control computers in seconds, Microsoft's open-source tool OmniParser V2.0 is released.

On February 12, Microsoft released on its official website the OmniParser Latest Version V2.0In addition, OpenAI (4o / o1 / o3-mini) is available,DeepSeek(R1), Qwen (2.5VL) and Anthropic (Sonnet) models into AI intelligences that can manipulate computers.

Compared to version V1, OmniParser V2 has been trained using larger scale interactive element detection data and icon feature caption data, resulting in higher accuracy and faster inference in detecting smaller interactable UI elements, with a latency reduction of 60%.

In the high-resolution Agent benchmark test ScreenSpot Pro.V2+GPT-4o had an accuracy of 39.6%, while the GPT-4o raw accuracy was only 0.8%.

In order to be able to experiment faster with different intelligences setups, theMicrosoft has also open-sourced OmniTool, a Dockerized Windows system that integrates a set of basic tools needed for intelligences, covering functions such as screen understanding, localization, action planning and execution, and a key tool for turning large models into intelligent bodies.

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.
Information

Google to drop diversity program, says no longer banning AI weaponization "good for society"

2025-2-17 10:26:27

Information

ByteDance's Chinese AI IDE "Trae" now supports Windows, with built-in GPT-4o for free!

2025-2-17 19:52:14

Search