GoogleRecently updated the blog post, announcingOpen Source Magika,Based on artificial intelligence, it can quickly and efficiently identify file formats and content types. The relevant source code has been hosted on GitHub.
Magika uses a custom, highly optimized deep learning model that can accurately identify file types in milliseconds even when running on a CPU.
Google shared Magika's performance data. The benchmark evaluation test results of 1 million files in more than 100 formats showed that Magika's performance was about 20% higher than existing tools. Magika's precision and recall rates both reached more than 99%.
Internally, Google has used Magika to strengthen user security. The system has been deployed at scale to send files in Gmail, Drive, and Safe Browsing to the appropriate security and content policy scanners. Compared with the previous system that relied on manually created rules, Google has found that Magika improves the accuracy of file type identification by 50%.
Google said that the integration of Magika with VirusTotal will further improve the efficiency and accuracy of the platform. Magika will act as a pre-filter before VirusTotal's Code Insight analyzes the file. Code Insight uses Google's generative artificial intelligence to detect malicious code.