Stability AI releases SD3 technical report to reveal more details of SD3

Stability AI recently released theirStrongestThe image generation model of the Stable Diffusion3(SD3) technical report, revealing more details about SD3. According to Stability AI, SD3 outperforms all current open source and commercial models in terms of typographic quality, aesthetic quality, and cue word understanding, and is the currentStrongestThe image generation model of the

Stability AI releases SD3 technical report to reveal more details of SD3

Highlights of the technical report are as follows.

Based on human preference assessments, SD3 outperforms currentFirstAdvanced text generation imaging systems such as DALL-E3, Midjourney v6 and Ideogram v1.

The report presents a new Multimodal Diffusion Transformer (MMDiT) architecture that uses separate sets of weights for image and language. This architecture improves the system's text comprehension and spelling capabilities compared to previous versions of SD3.

The SD38B size model can be run on GTX409024G memory. In addition, SD3 will be releasing several models of varying parameter sizes to run on consumer hardware, ranging from 800M to 8B.

The SD3 architecture is based on the Diffusion Transformer ("DiT", see Peebles & Xie, 2023). Given the conceptual differences between text embedding and image embedding, they use separate sets of weights for the two modalities. In this way, information can flow between image tokens and text tokens, which improves the overall comprehensibility and typographic quality of the results generated by the model.

SD3 employs the Rectified Flow (RF) formulation, where data and noise are connected on a linear trajectory during training. This results in a straighter inference path, which can be sampled using fewer steps.

They also conducted research on extending the corrective flow Transformer model, using a reweighted RF formulation and the MMDiT backbone network to train a series of models ranging in size from 15 Transformer blocks (450 million parameters) to 38 blocks (8 billion parameters).

SD3 also introduces flexible text encoders, and by removing the memory-intensive T5 text encoders (with up to 4.7 billion references) in the inference phase, SD3's memory footprint can be significantly reduced with little performance loss.

Overall, this technical report from Stability AI reveals the power and details of SD3, showing its leadership in image generation.

Details here: https://stability.ai/news/stable-diffusion-3-research-paper

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Information

CCTV Finance launches AI anchor for the two sessions, with technical support from Ant Lingjing digital human platform

2024-3-6 9:48:13

Information

Chip ban upgraded: AMD banned from selling AI chips specifically for China

2024-3-6 9:50:42

Search