Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

Today, I'd like to introduce you to a proposal by the Beijing Institute of Artificial Intelligence for a unifiedImage Generation ModelOmniGenOmniGen can be used to perform a variety of tasks including, but not limited to, text-to-image generation, subject-driven generation, identity preservation generation, image editing, and image condition generation.OmniGen requires no additional plug-ins or operations, and it automatically recognizes features (e.g., desired objects, body poses, depth mapping) in the input image based on textual prompts.

Related links

  • Thesis: https://arxiv.org/pdf/2409.11340
  • Code: https://github.com/VectorSpaceLab/OmniGen
  • Trial: https://huggingface.co/spaces/Shitao/OmniGen

summarize

OmniGen is a unified image generation model that generates a variety of images based on multimodal cues. It is designed to be simple, flexible and easy to use. The authors have provided the inference code so that everyone can explore more features of OmniGen.

Existing image generation models often need to load multiple additional network modules (e.g., ControlNet, IP-Adapter, Reference-Net, etc.) and perform additional preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) in order to generate satisfactory images. However, we believe that future image generation paradigms should be simpler and more flexible, i.e., generating various images directly from arbitrary multimodal instructions without additional plug-ins and operations, similar to how GPT works in language generation.

Due to limited resources, OmniGen still has room for improvement. The model will continue to be optimized and hopefully it will inspire more general image generation models. You can also easily fine-tune OmniGen without having to worry about designing a network for a specific task; all you need to do is prepare the appropriate data and run the script. Imagination is no longer limited; everyone can construct any image generation task, and maybe we can achieve really fun, fantastic and creative things.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

What can OmniGen do?

OmniGen is a unified image generation model that can be used to perform a variety of tasks including, but not limited to, text-to-image generation, subject-driven generation, identity preservation generation, image editing, and image conditioning.OmniGen does not require any additional plug-ins or manipulations, and it automatically recognizes features in the input image based on textual prompts (e.g., the desired object, body pose, depth mapping). ).

Below is a description of OmniGen's capabilities: Flexible control of image generation with OmniGen Demo

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

Quote Emoji Generation

Multiple images can be entered and objects in the images can be referenced using simple, common language.OmniGen automatically recognizes the necessary objects in each image and generates a new image based on those objects. No additional operations such as image cropping or face detection are required.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

methodologies

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

OmniGen's FrameworkThe text is tokenized and the input image is converted to an embedding by VAE. The text is labeled as tokens and the input images are converted to embeddings by VAE. OmniGen can accept free-form multimodal cues and generate images by rectification methods.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

Example of OmniGen model training data. Inputs from all tasks are normalized to an arbitrarily interleaved image text sequence format, which is used as the model's cue. The placeholder |image_i| indicates the position of the ith image in the cue.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

(a) Description of the construction process of the GRIT-Entity dataset. We use instance segmentation and redrawing methods to acquire a large amount of data. (b) Illustration of the cross-validation strategy used in constructing our web image dataset. For the ensemble of Person A and Person B, we extracted several images from the single photos of Person A and Person B and asked MLLM whether they appeared in the ensemble. The group photo is retained only if the "yes" ratio of both Person A and Person B reaches a specific threshold. The single images labeled as "yes" were then used to construct data pairs with the corresponding group images.

More results on display

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

Results of text to image generation.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

Theme-driven generation of resultsOmniGen can generate a new image based on the objects in the reference image. When the reference image contains multiple objects, OmniGen can automatically recognize the required objects based on text commands.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

OmniGen results in different image generation tasks.

Unified image generation model, OmniGen, supports text-to-map, image editing, and also pose detection.

OmniGen results in a traditional variety of vision tasks.

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Encyclopedia

What are the best AI video generation software in China? Share 6 domestic AI video generation tools

2024-11-6 9:43:39

TutorialEncyclopedia

How to remove the picture watermark? Remove picture watermark tutorial with AI tool (Tencent Yuanbao)

2024-11-9 10:04:54

Search