February 17th.Microsoft OmniParser It is an AI tool for parsing and recognizing on-screen interactive icons by purely visual GUI-based intelligences, previously paired with GPT-4V to significantly enhance recognition capabilities.
On February 12, Microsoft released on its official website the OmniParser Latest Version V2.0In addition, OpenAI (4o / o1 / o3-mini) is available,DeepSeek(R1), Qwen (2.5VL) and Anthropic (Sonnet) models into AI intelligences that can manipulate computers.
Compared to version V1, OmniParser V2 has been trained using larger scale interactive element detection data and icon feature caption data, resulting in higher accuracy and faster inference in detecting smaller interactable UI elements, with a latency reduction of 60%.
In the high-resolution Agent benchmark test ScreenSpot Pro.V2+GPT-4o had an accuracy of 39.6%, while the GPT-4o raw accuracy was only 0.8%.
In order to be able to experiment faster with different intelligences setups, theMicrosoft has also open-sourced OmniTool, a Dockerized Windows system that integrates a set of basic tools needed for intelligences, covering functions such as screen understanding, localization, action planning and execution, and a key tool for turning large models into intelligent bodies.