Recently Google releasedGemini The image generation model of 2.0 Flash can communicate with AI through natural language to reach various image generation functions.
I. Methods of use
It's currently available for free by signing in through Google Ai Studio (requires a smooth internet connection):
After logging in, select Gemini 2.0 Flash (Image Generation) Experimental from the Model drop-down menu:
II. Actual measurement
1、Local repainting + style repainting
First Vince draws and asks the AI to dress the character (partial redraw). As you can see, the original shape of the character stays well:
Add new characters next to each other (partial redraws) and pose them differently (character references):
Gemini 2.0 Flash supports modifying local (uploaded) images, and dressing (local repainting) is equally smooth:
Change the image painting style (style redrawing), the actual test of this function on the accuracy of the cue word requires a high degree of accuracy, after a few attempts, success:
2. Expanding the map
The AI was asked to enlarge the image frame (enlargement) and the result was perfect:
3、Light and shadow control
Modifications to the light (light and shadow control), which previously might have required specialized tools like IC-Light, are now a one-sentence job:
Continuing to try other lighting effects, the AI does comply with the cue words better:
Replace the scene (partial repaint) + change the lighting (lighting control):
With all the composite features above, changing the characters is a piece of cake:
4, sub-shot generation
With AI painting, it's a big challenge to maintain consistency. Here we try to change the photo perspective:
Based on the original scene, further changes in angle and movement (split-screen) were made, and the results were more than satisfactory:
Trying again with actress photos and testing from multiple angles, the conclusion was that the AI did understand and implement the character's physical characteristics:
Based on this, the scenery is further changed:
Compared with the previous case, the AI's "brain" ability is further demonstrated here:
Based on such powerful features, the consistency problem in AI video creation will gradually be solved.
III. Conclusion:
1, Gemini 2.0 Flash may not be able to come up with satisfactory results at one time, but it supports continuous dialog and can eventually achieve good results.
2, natural language (no threshold) interaction + "big unified" map generation function, Gemini 2.0 Flash may be the future of AI map generation tools set a new benchmark.
3, AI tools are getting stronger, Photoshop's market will be further compressed!