Compared with Microsoft's booming business,AppleApple's layout in the field of AI seems to be much more low-key, but this does not mean that Apple has no achievements in this field.Apple recently released a newMGIE” is a new open source AI model that can edit images based on natural language instructions.
The full name of MGIE is MLLM-Guided Image Editing, usingMultimodal Large Language Models(MLLM) interprets user instructions and performs pixel-level operations. MGIE can understand natural language commands issued by users and perform operations such as Photoshop-style modification, global photo optimization, and local editing.
Apple and researchers from the University of California, Santa Barbara are collaborating to present MGIE-related research results at the 2024 International Conference on Learning Representations (ICLR), one of the top conferences for artificial intelligence research.
Before introducing MGIE, let's first introduce MLLM. MLLM is a powerful AI model that can process text and images simultaneously to enhance instruction-based image editing capabilities. MLLMs have shown excellent capabilities in cross-modal understanding and visual perceptual response generation, but have not yet been widely used in image editing tasks.
MGIE integrates MLLMs into the image editing process in two ways: First, it uses MLLMs to derive expressive instructions from user inputs. These instructions are concise and provide clear guidance for the editing process.
For example, when you enter "Make the sky bluer”, MGIE can generate “Increased the saturation of the sky area by 20%” instruction.
Second, it uses MLLM to generate visual imagination, a latent representation of the desired edits. This representation captures the essence of the edits and can be used to guide pixel-level operations. MGIE adopts a novel end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image editing modules.
MGIE can handle a wide range of editing situations, from simple color adjustments to complex object manipulations. The model can also perform global and local editing based on the user's preferences. Some of the features and capabilities of MGIE include:
-
Directive-based expression editing:MGIE can generate clear and concise instructions to effectively guide the editing process, which not only improves the quality of editing but also enhances the overall user experience.
-
Photoshop style modification:MGIE can perform common Photoshop-style edits, such as cropping, resizing, rotating, flipping, and adding filters. The model can also apply more advanced edits, such as changing backgrounds, adding or removing objects, and blending images.
-
Global photo optimizationMGIE can optimize the overall quality of a photo, such as brightness, contrast, sharpness, and color balance. The model can also apply artistic effects such as sketching, painting, and comics.
-
Local Edit:MGIE can edit specific regions or objects in an image, such as faces, eyes, hair, clothing, and accessories. The model can also modify the properties of these regions or objects, such as shape, size, color, texture, and style.
MGIE is an open source project on GitHub where users can find code, data, and pre-trained models. The project also provides a demo notebook showing how to use MGIE to complete various editing tasks.