Hugging Face文生成图模型aMUSEd 几秒钟内就能生成AI图像

An AI image generation modelmaximumThe problem is speed: use ChatGPT or Stable Diffusion Generating a single image can take several minutes. Even Meta CEO Mark Zuckerberg complained about the image generation speed at last year’s Meta Connect conference.

Hugging Face The team is trying to speed that up with a new model called aMUSEd, which can generate images in just a few seconds.

This lightweight text-to-image model is based on Google's MUSE model, with a parameter size of about 800 million. aMUSEd can be deployed on devices such as mobile devices. Its speed comes from the way it is built. aMUSEd uses an architecture called Masked Image Model (MIM) instead of the latent diffusion in Stable Diffusion and other image generation models.

The Hugging Face team said that MIM reduces the number of inference steps, thereby improving the speed of model generation and interpretability. And its small size also makes it run quickly.

Hugging Face text generation model aMUSEd can generate AI images in seconds

aMUSE Project experience website: https://huggingface.co/papers/2401.01808

You can try aMUSEd via the demo on Hugging Face. The model is currently available as a research preview, but uses an OpenRAIL license, which means it can be experimented with or tweaked, while also being friendly to commercial adaptations.

The quality of images generated by aMUSEd can be further improved, and the team openly acknowledges this, choosing to release it to "encourage the community to explore non-proliferation frameworks like MIM for image generation."

The aMUSEd model can perform zero-shot image restoration, which Stable Diffusion XL cannot do, according to the Hugging Face team.

As for how AI images are generated in seconds, the MIM method in aMUSEd is similar to the techniques used in language modeling, where certain parts of the data are hidden (or masked) and the model learns to predict these hidden parts. In the case of aMUSEd, it is the image that is hidden instead of the text.

When training the model, the Hugging Face team uses a tool called VQGAN (Vector Quantized Generative Adversarial Network) to convert the input image into a series of tokens. The image tokens are then partially masked, and the model predicts the masked portion based on the unmasked portion and the prompt through a text encoder. During inference, the text prompt is converted into a format that the model understands through the same text encoder. aMUSEd starts with a set of randomly masked tokens and gradually refines the image.

During each refinement, the model predicts parts of the image, retains the parts it is most confident about, and continues to refine the rest. After a certain number of steps, the model’s predictions are processed through the VQGAN decoder to generate the final image.

aMUSEd can also be fine-tuned on custom datasets. Hugging Face showed a model fine-tuned with an 8-bit Adam optimizer and float16 precision, using less than 11GB of GPU VRAM.

The training script for model fine-tuning can be accessed here:

https://github.com/huggingface/diffusers/blob/main/examples/amused/train_amused.py

statement:The content of the source of public various media platforms, if the inclusion of the content violates your rights and interests, please contact the mailbox, this site will be the first time to deal with.

Hugging Face text generation model aMUSEd can generate AI images in seconds

The Ministry of Science and Technology issued a document to regulate the use of AI. Researchers are prohibited from directly generating application materials with AIGC

California senator introduces bill to ban government from working with unethical AI companies

AI Weibo

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai tiktok

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

1ai WeChat

Five minutes a day

Become a master in one year

Scan the QR code to follow

Related content:

The Ministry of Science and Technology issued a document to regulate the use of AI. Researchers are prohibited from directly generating application materials with AIGC

California senator introduces bill to ban government from working with unethical AI companies

Research: Generating an AI image consumes as much energy as fully charging a mobile phone

The era of ChatGPT programming has arrived, and GitHub Copilot Enterprise is officially released!

ChatGPT adds a new reading function, which can broadcast the generated results by voice

OpenAI opens GPT-4o voice model to some paid subscribers, providing more natural real-time conversations

AI Applications

5000+ AI applications! Updated daily

1AICLUB

Highly recommended! Official brand Weibo

AI Tutorials

Tons of tutorials to read

AI Basic Training Camp

Zero-based entry, leading you to become an AI expert

1ai master

TikTok account: 1ai.net

1ai master

TikTok account: 1ai.net

Five minutes a day

Become a master in one year

Scan the QR code to follow