-
Preface
AIGC Related content has been very popular recently, so I want to take advantage of the trend. This is the first article on this topic, which will first introduce AIGC and related creation tools.
AIGC
The full name of AIGC is "AI Gererative Content", which is a new way of content creation after professional-generated content (PGC, Professional-generated Content) and user-generated content (UGC, User-generated Content).
One of the milestones of AIGC entering the public eye was the digital painting "Space Opera" generated by MidJourney in early September last year:
It won first place in the art competition at the Colorado State Fair in the United States. The judges of the competition did not rejudge the work and believed that even if it was an AI-generated work, it still deserved such a result. After the news was reported, it sparked widespread discussion inside and outside the circle.
Later, a B station UP host used Midjourney to generate pictures based on the meaning of the lyrics to create the work "Universal Youth Hostel" Kill that Shijiazhuang guy 》became popular, and then many UP hosts released similar works with the theme of "but every line of lyrics is drawn by AI", such as "Young And Beautiful", "The Lone Brave Man", "Seven Miles of Fragrance", etc. I also started to pay attention to this field at this stage. Of course, my understanding at that time was still limited to AI painting (txt2img, that is, input text and the computer will translate it into an image). In fact, looking back now, the field of generated content is very broad.
Through the AIGC creation method, a non-professional user like me who has no painting foundation can also create very satisfactory works.
AIGC's main creative tools
Then I listed some creative tools that I think are very important according to the timeline.
DALL-E
In January 2021, OpenAI launched the DALL-E model, which uses a 12 billion parameter version of the GPT-3 Transformer model to understand natural language input and generate corresponding images. However, it was launched mainly for research, so access is limited to a small number of beta users. This model is unstable, has incomplete understanding of details, and may have serious logical or factual errors, but as a pioneer, it still needs to be specifically mentioned.
When DALL-E was released, it was also released CLIP (Contrastive Language-Image Pre-training). CLIP is a neural network that returns the best caption for an input image. It does the opposite of what DALL-E does - it converts images to text, while DALL-E converts text to images. CLIP was introduced to learn the connection between visual and textual representations of objects.
Disco Diffusion
Disco Diffusion It is a deep learning model based on diffusion+CLIP that was open sourced in October 21. It can generate images by inputting text. This tool usually runs on the Google Colab platform and does not require local configuration, so it does not require computer configuration and can be run in the browser.
Below is the rendering released by artist and designer Somnai when the project was open sourced:
In actual use, it works well for scenery, subject, and style, but the effect on characters is relatively poor. After Somnai joined MidJourney, this project stopped updating.
DALL-E 2
In April 2022, OpenAI released a new version of DALL-E 2 It is an upgraded version of DALL-E. It can also perform secondary editing on the generated images. Now even new users need to recharge to generate new images. I have not experienced it, but only learned about it through the dynamic side displayed by the official Instagram account. However, it can also be experienced through Bing:https://www.bing.com/create/
The paintings it generates are relatively simple and straightforward compared to the two types described below.
MidJourney
MidJourney's v1 was released in February 2022, and it became popular due to the v3 version in July 22.
Its characteristics are comprehensive capabilities, strong artistic quality, and very similar to works made by artists. In addition, the image generation speed is faster. In the early days, many artists used Midjourney as inspiration for their creations. In addition, because Midjourney is installed on the Discord channel, it has a very good community discussion environment and user base.
The second time it became popular was actually the release of V5 in March this year. The official said that this version has made significant improvements in the realism of the characters in the generated images, the details of the fingers, etc., and has also made progress in the accuracy of prompt word understanding, aesthetic diversity and language understanding.
Now new users can no longer generate images for free, they need to subscribe. I won't demonstrate it, but I have two experiences:
- If you don't know how to enter the correct and valuable prompt words, you can generate prompts from URLs like Extended Reading Links 5. There are many similar websites.
- If you want to become a MidJourney expert, you need to learn a lot of skills. You can search for various related articles and videos online, such as Extended Reading Links 9 and 10 (of course, you should also read the official documents)
Stable Diffusion
In August 2022, Stable Diffusion was open sourced.
The Stable Diffusion algorithm is based on the latent diffusion model (LDM / Latent Diffusion Model) proposed in December 2021 and the diffusion model (DM / Diffusion Model) proposed in 2015 (it is based on Google’s Transformer model), so Diffusion is in the name, and I guess Stable means that the algorithm is now stable.
It is necessary to first talk about the confusing points of this project. It is open source. If you have studied it yourself, you can find three projects with the same name on Github:
- https://github.com/CompVis/stable-diffusion
- https://github.com/runwayml/stable-diffusion
- https://github.com/Stability-AI/stablediffusion
First, the University of Munich machine vision learning group CompVis wrote the paper, AI video editing technology startup Runway provided expertise to help implement the first version, startup StabilityAI invested money, and finally Stable Diffusion pushed it into the mainstream market (actually it is now Version 2). So now we only need to focus on the third project.
SD will separate the imaging process into a "diffusion" process at runtime - starting from the noisy situation, gradually improving the image according to the correlation score between the image and the text according to CLIP, until there is no noise at all, thus gradually approaching the provided text description. For the specific principle, please refer to the extended reading link 8.
SD can generate high-definition, high-fidelity, and wide-range style images in just a few seconds. Its biggest breakthrough is that anyone can download and use its open source code for free, without having to pay for it as a cloud service like MidJourney and DALL-E.
Stable Diffusion XL
Currently, the two most annoying shortcomings of SD are:
- Requires very long prompts
- The treatment of human body structure is flawed, and abnormal movements and human body structure often occur
In April 2023, Stability AI released a Beta version Stable Diffusion XL , and mentioned that it will be open source after the parameters are stabilized after the training is completed, and the above two shortcomings have been improved.
Comparison between MidJourney and Stable Diffusion
First of all, I want to point out that AI drawings are highly random and stylized. Even if you have a relatively accurate prompt word, maybe changing the seed can reverse the result, but it is not easy to compare directly. I am just comparing it from the side:
- Price. MidJourney is for profit after all, and the cost of deploying it on your own server is far lower. SD wins
- Friendliness. MidJourney is friendly to newbies and can be used right after registration. In contrast, SD requires a certain technical background. It can even be said that designers or artists themselves do not have the ability to deploy. SD wins by a small margin.
- Function. In addition to supporting all the functions of MidJourney, SD also supports filling repair and custom models. SD wins
- Control over details. Similar to the difference between Apple (MidJourney) and Android (SD), MidJourney is a commercial product. You cannot understand the principles and code logic behind it, so the controllability is poor and the details are difficult to optimize (even getting worse the more you adjust it). SD is open source and has a strong community and related models and extensions. It can achieve local private deployment, precise local tuning, and control style. SD wins hands down.
- Prompt method. Midjourney is a natural language input (direct text expression requirements), while SD is a variety of weighted prompt word input. The prompt words of SD are very challenging for the inputter's ability, and Midjourney wins slightly.
- Overall, I think MidJourney's graph is a little more refined, but as a non-algorithm developer, I feel that SD currently loses in terms of model training materials and methods. MidJourney wins by a small margin.
- Good at painting style. MidJourney focuses on expression and rendering of details, while Stable-Diffusion is more realistic. If you want to create art, MidJourney is better. If you already have specific needs, SD is better.
It should be noted that the products mentioned above are all based on the same underlying principle of the Diffusion model, but they differ in the productization routes. However, I am currently more optimistic about the future development of SD (otherwise I would not have written a special topic 😋).
Further reading
- https://search.bilibili.com/all?vt=33108793&keyword=%E6%AF%8F%E4%B8%80%E5%8F%A5%E6%AD%8C%E8%AF%8D%E9%83%BD%E7%94%B1AI%E4%BD%9C%E7%94%BB
- https://zh.wikipedia.org/wiki/DALL-E
- https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb
- https://www.midjourney.com/home/
- https://stariu.com/midjourney
- https://github.com/Stability-AI/stablediffusion
- https://clipdrop.co/stable-diffusion
- https://stable-diffusion-art.com/how-stable-diffusion-work/
- https://www.uisdc.com/midjourney-5
- https://www.uisdc.com/midjourney-4
statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.