How to useChatGPTDrawingHow to ensure the consistency of the characters?
The output of DALL-E that comes with chatgpt is unstable, including unstable character consistency and unstable aspect ratio. Today, I will teach you the simplest way to achieve stable aspect ratio and character consistency, so that you can easily start hand-drawn book content production, lower the threshold, and quickly get positive and negative feedback.
1. Problem Controlling Aspect Ratio
Friends who have used Midjourney know that we can control the image size we want through the --ar command, but it doesn’t seem to be so easy to use in DALL-E.
Currently, DALL-E supports 3 resolutions:
- Square (1024x1024): This is the default resolution. The system automatically outputs this size unless there is a special requirement for the prompt word.
- Landscape (1792x1024): Suitable for landscapes, panoramas, or any image that requires a horizontal orientation, and is suitable for the production of horizontal content.
- Portrait (1024x1792): Best for full-body portraits, tall structures, or any image that requires a vertical orientation, and is suitable for production of vertical content.
- So how do you write prompt words to stably generate the desired image size? Start from scratch~
First I have no ideas, let gpt generate ideas for me.
Just let him output the picture according to prompt 2~
As you can see, the directly generated image is a 1024x1024 square image. How can we make it horizontal? Add the keywords: full body portrait or vertical images.
As you can see, 1024x1792 vertical images have been stably generated. How to generate horizontal images? Use the keyword: wide images
At this point, the problem of image size stability is solved.
2. How to solve the problem of character consistency?
Method 1: The style of images generated in the same latent space can remain consistent.
In layman's terms, it means to let dall-e generate a multi-grid image. For example:
After that, crop and enlarge the high-definition image, and you can start creating.
Method 2:
If you want to control the performance of each graph, you can use the following method:
Use prompt words: upper left, lower left, upper right, lower right layout segmentation
Please note that this is one image, not the four images that DALL-E 3 generates by default.
Prompt word template: [Medium] [Layout] [Upper left description] [Upper right description] [Bottom left description] [Bottom right description]
Finally, by analogy, can you let dall-e generate a story in one go?
The layout of the picture determines the size, multiple grids plus the description of the layout, can you still say that the consistency is not good? Or do you think it is difficult to split and enlarge it?
Of course, the gameplay shown in the picture above can be extended to many other ways. For example, is it possible to make several pictures into frame pictures of a person dancing, and then edit them after processing?
For example, is it possible to generate a frame-by-frame image of a person or a celebrity's facial expression, from calm to big to crying, and so on?
For example, is the entire process of a dragon opening its mouth and breathing fire possible?
Of course, the gameplay of dall-e is far more than that. Go ahead and explore it, young man. AI is a tool, and the tool allows you to fiddle with it at will. As for how to apply it to money-making scenarios, this is the key.