Kuaishou open-sources image generation model Kolors to support text generation in the picture

quick workerA big move was made.Open SourceThe image generation model of our company——"Ketu This is not an ordinary model. It is trained on billions of text-image pairs, equipped with a general language model (GLM) as a text encoder, supports bilingual prompts in Chinese and English, and can handle contexts up to 256 tokens.

Kolors Features at a Glance:

  • Chinese and English bilingual support:The general language model (GLM) is used as the text encoder, so that the model is not only proficient in English, but also can perfectly understand and use Chinese prompts.
  • Long text processing capabilities:Supporting a context length of up to 256 tokens, it allows creators to describe their ideas in detail, whether it is a complex scene or a rich story.
  • Massive data training:Trained on billions of text-image pairs, the model has a large knowledge base and is able to generate diverse and accurate images.
  • Optimization of Chinese cultural elements:Special optimization has been carried out for Chinese cultural elements, making the generated images more in line with Chinese cultural characteristics and meeting localization needs.
  • Chinese text generation:"Kolors" can not only understand Chinese, but also embed Chinese characters in the generated pictures to add more expressiveness to the images.

After testing, I found that the performance of inserting Chinese into pictures is better now, and it can basically be output correctly, but for English, it is easy to miss words or make mistakes.

Kuaishou open-sources image generation model Kolors to support text generation in the picture

Kuaishou open-sources image generation model Kolors to support text generation in the picture

As you can see, the Chinese version of the lying cat generated above is completely fine, but when I change it to "AIbase", some characters are missing. As far as Chinese output is concerned, Ketu's performance is remarkable, but please note that the text should not be too long, otherwise it is easy to make mistakes.

This model is not just a simple tool, it has the powerful technical support of Kuaishou. It is trained on massive data and has special optimization for Chinese cultural elements, so the generated images are more Chinese. This is not only a technological breakthrough, but also a cultural inheritance.

The open source plan also includes CN (ControlNet) support, LoRa (low-rank adaptation), IPA (image prompt adaptation) and ComfyUI direct support, all of which are designed to make your creative process more smooth and personalized.

Technical details:

  • "Kolors" is based on the SDXL model architecture and integrates ChatGLM256 technology to enhance bilingual understanding and text generation capabilities.
  • It is worth noting that running this model requires a large amount of video memory, about 19GB, which may have certain requirements on the hardware device.

Kuaishou's open source "Kolors" is not only a contribution to the technology community, but also a bold push for creative freedom. This shows Kuaishou's determination and strength in AI technology, and also allows us to see the infinite possibilities of AI in artistic creation.

Ketu official website: https://top.aibase.com/tool/kuaishouketudamoxingkolors

Project address:https://www.1ai.net/12103.html 

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

Dark Side of the Moon launches Kimi browser plug-in to support pen, summarizer and other functions

2024-7-9 9:15:41

Information

Meta AI develops a compact language model MobileLLM for mobile devices with only 350 million parameters

2024-7-9 10:13:05

Search