Have you gotten used to usingChatGPTCommunicate in text? Now it's mastered a new skill - generating ultra-realistic images directly!
upgrade point
At the heart of this upgrade is the fact that ChatGPT has integrated a brand new image generation feature, and driving this powerful capability is the newest GPT-4o Modeling. Unlike the previous ones, the technical principle of GPT-4o is different from the traditional diffusion model (Diffusion), which employs an autoregressive model similar to human writing: starting from the top left corner of the image and gradually drawing to the bottom right corner, each step is based on the previously drawn content The pattern of autoregressive writing is significantly more accurate and detailed. This mode dramatically improves the accuracy of detail and the rendering of text.
Gabriel Goh, OpenAI's head of research, revealed that the process was iterated for nearly a year, with hundreds of human trainers involved in correcting details to improve the AI's drawing.
How does it work?
Currently, there are two main channels for using GPT-4o's image generation capabilities:
- ChatGPT: When you select the drawing function in ChatGPT, if you see that the options have been updated, it means that you are using GPT-4o instead of the previous DALL-E 3. You can directly describe your needs via text, let ChatGPT generate the image, and even refine and modify the image step-by-step via dialog.
- Sora Website: The full modal capabilities of GPT-4o have been incorporated into Sora, and one of the benefits of using it on Sora is that it is extremely fast and may not have the traffic limitations of ChatGPT. However, images generated on Sora may not be able to be modified for multiple rounds of dialog at this time.
This update is available to all users of ChatGPT Free, Plus, Pro and Team. However, it should be noted that there is still a limit on the number of images that can be generated per day for the free version (previously 3 per day for DALL-E, not yet announced for GPT-4o, but expected to be similar).
Currently, GPT-4o generates images slightly slower than the previous DALL-E 3, but OpenAI says the delay is totally worth it because "the improvement in image quality and knowledge integration far outweighs the inconvenience of waiting a few seconds."
Let's start with a simple example in ChatGPT:
The Chinese text works very well! It's simply a change from the old impression of AI out of the picture showing Chinese text!
What are the advantages and disadvantages of GPT-4o's "Draw" feature?
GPT-4o's image generation capability brings many surprising enhancements, mainly in the following areas:
- More Accurate Details and Complex Binding: GPT-4o excels in complex binding. In the past, it was difficult for AI to accurately draw objects with multiple colors and shapes, but GPT-4o is able to accurately handle 10-20 objects and their attributes, which allows the screen to show more accurate details and meet the needs of complex scenes.
- Text Generation Capability Jumps, Goodbye to Messy Codes: In the past, AI-generated text on images often had typos, messy codes, and other problems, affecting usability. GPT-4o specifically solves this pain point by generating clear and accurate text in a stable manner. Whether it's a restaurant menu, a scientific diagram, or a branding poster, now ChatGPT can do it in one click with results comparable to professional designers.
- Stronger Knowledge Integration for On-the-Go High-Quality Science Content: GPT-4o is able to generate images that match real-world knowledge by taking knowledge directly from large models. With a simple hint, such as "Newton's trigonometry experiment", it can accurately reproduce the experiment without explaining more details.
- New Multi-Round Generation: Image generation is now native to GPT-4o, allowing for natural dialog to progressively refine images and keep content consistent. For example, when designing a game character, the character's appearance can be consistent over multiple iterations and adjustments.
- Powerful command adherence: GPT-4o's image generation adheres to very detailed cues with great attention to detail.
- Excellent Context Learning Capabilities: GPT-4o can analyze and learn from user uploaded images, seamlessly integrating their details into a context that can be used to guide subsequent image generation. For example, a stylized illustration can be uploaded and GPT can be asked to generate a different object in the same style.
- PHOTO-RELEVANT EFFECTS AND VARIETY OF STYLES: Trained on a large number of different styles of images, the GPT-4o is able to create or transform images in a convincing way. Whether it's a Monet-style cat or a fantasy-style dolphin subway, it can handle it with ease.
- Smarter and more efficient: The built-in image generation function enables GPT-4o to connect the knowledge between text and image, making it smarter and more efficient in image generation.
Of course, the image generation of GPT-4o is not perfect and there are still some drawbacks
- For long images, cropping problems may occur
- Models can hallucinate and make things up.
- Difficult to accurately render more than 20 different concepts
- May not be accurate enough to handle non-Latin languages (e.g. Chinese) in multilingual text rendering
- Requires editing of specific parts, may be buggy, etc.
Nevertheless, the emergence of GPT-4o native multimodal, with its almost mature quality, heralds a new era of image generation . With such high generation quality and silky smooth multimodal dialog, it is no longer just a question of whether it can be used, but whether it can revolutionize the existing AI mapping ecosystem and the way people interact with AI mapping. This is undoubtedly the age of AI, and the age of all of us.
What business value can there be beyond entertainment?
The new GPT-4o is not only limited to entertainment and science, but is also a new tool for enterprise business, for example:
- Design teams can quickly generate brand logos and transparent background maps.
- Restaurant owners make menus, promotional posters in a minute
- Office scenarios directly generate presentations and high-quality diagrams
- Can be used to quickly generate science illustrations
- Ability to generate useful images such as menus, wedding invitations, etc. based on conversations.
- Generate ready-to-use cocktail recipes, pizza flowcharts, infographics for the momentum and impulse theorems, etc. based on real-world knowledge.
All in all, GPT-4o's powerful image generation capability can reduce the dependence on professional drawing tools and designers, and greatly improve the efficiency of content creation and marketing.
Security Issues and Reflections
While enjoying the power of GPT-4o, OpenAI takes security and copyright issues very seriously and has taken a number of measures.
- Prohibition of generation of pornographic content, inappropriate images of children
- Prohibition of watermark removal and imitation of works by living artists
- All generated images contain C2PA metadata, labeled as AI-generated for easy tracking of the source
- OpenAI has been licensed by Shutterstock and others for data training, and also provides an active "opt-out" mechanism for artist content to protect copyright and compliance.