Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

aboutStable diffusion 3The advantages will not be elaborated here, here we will mainly talk about how ordinary users can deploy it locally.

Currently, the SD3 model has been open sourced in HuggingFace. The address is:https://huggingface.co/stabilityai/stable-diffusion-3-medium

However, to download the model, you need to log in to your Hugging Face account and sign a license agreement.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

After signing, you can see the file list of the entire project.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

The above file can be roughly divided into three parts:

1. comfy_example_workflows

This part mainly contains three Comfyui workflow files, which will be used below.

The reason why Comfyui is used is that the current (as of the time of article publication) Stable diffusion webui does not support SD3, so it is not possible to run SD3's large model. I also tried to load sd3_medium.safetensors with the current Stable diffusion webui, but it reported this error.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Comfyui currently supports it, but you need to pay attention to upgrading the kernel version. Lower versions are not supported. As you can see from the picture below, the June 11 version has just started to support SD3 (it can be seen that Comfyui received the news very early).

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

I personally don't recommend just upgrading to version 8c4a9be, because the subsequent release notes actually contain a lot of updates about SD3, which shows that 8c4a9be is just a relatively rough version. I personally upgraded directly to the latest version.

2. text_encoders

There are four model files in text_encoders, namely:

├── text_encoders/ │ ├── clip_g.safetensors │ ├── clip_l.safetensors │ ├── t5xxl_fp16.safetensors │ └── t5xxl_fp8_e4m3fn.safetensors

There are three different text encoders here, namely two CLIP models and a T5 model. The T5 model also provides two quantization versions, one 16-bit quantization (fp16) and the other 8-bit quantization (fp8).

Let me add here,CLIP model[1]Originally developed by OpenAI, it is a multimodal pre-trained model that can understand the relationship between images and text. CLIP is trained on a large number of image and text pairs to learn a representation method that can align text descriptions with image content. This representation method enables CLIP to understand the content of the text description and match it with the image content.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

In simple terms,CLIP[2]It is to convert our prompt words into a "language" (vector) that the SD model can understand, so that the SD model knows what kind of pictures to generate.

3. Checkpoints

This time, SD3 released a total of four large models, namely:

├── sd3_medium.safetensors (4.34G) ├── sd3_medium_incl_clips.safetensors (5.97 G) ├── sd3_medium_incl_clips_t5xxlfp8.safetensors (10.9G) └── sd3_medium_incl_clips_t5xxlfp16.safe tensors (15.8G)

sd3_medium.safetensors is a relatively pure base model that only contains the relevant weights of MMDiT and VAE, but does not contain any of the text encoders mentioned in point 2 above.

sd3_medium_incl_clips.safetensors is a combination of sd3_medium + clip_g + clip_l. This model will be smaller, but the performance of the model without the T5XXL encoder will be different.

sd3_medium_incl_clips_t5xxlfp8.safetensors is a combination of sd3_medium + clip_g + clip_l + t5xxl_fp8_e4m3fn, a model that strikes a balance between quality and resource requirements.

sd3_medium_incl_clips_t5xxlfp16.safetensors is a combination of sd3_medium + clip_g + clip_l + t5xxl_fp16. The quality should be higher, but the video memory occupied is also the highest.

Next, enterPracticeIf you have already installed Comfyui, you can skip the first part.

1. Install Comfyui

The first step is to download the installation package. Here I have prepared two forms: Quark Cloud Disk and Baidu Cloud Disk

# Quark Cloud Disk is the installation package provided by Qiuye, so it contains more content https://pan.quark.cn/s/64b808baa960#/list/share/377bd955c75a411c8d1d01f366255cdb-ComfyUI*101aki # Baidu Cloud Disk is downloaded and uploaded by myself, and only an integrated package is put in it for the convenience of friends who don’t have Quark Cloud Disk Link: https://pan.baidu.com/s/103QhzN5R7m-19-JFW6Z9Yw Extraction code: 6666

The second step is to unzip and install. Double-click the launcher. The layout of the launcher is basically the same as the previous SD launcher.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

After the startup is successful, click one-key start, and this screen appears, which means you are halfway there.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

The third step is to upgrade the kernel version. If you have already upgraded, you can skip it.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

I have upgraded to the latest version, including the extended version.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

There are some errors when I start it (probably due to the version upgrade), but after actual testing, it does not affect the output of SD3.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI
Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

2. Download the model

We don't need to use all the above models this time, we only need to use these four models:

  • sd3_medium.safetensors
  • clip_g.safetensors
  • clip_l.safetensors
  • t5xxl_fp8_e4m3fn.safetensors

After downloading, put sd3_medium.safetensors into the models\checkpoints folder, and the other three into the models\clip folder.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

3. Import workflow

The one used here is sd3_medium_example_workflow_basic.json under the comfy_example_workflows folder

Click the folder icon in the upper left corner and a list of workflow files will pop up.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Click Import to import the previous sd3_medium_example_workflow_basic.json file

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

If you import the workflow first and don't put the model, you may get this prompt, which actually tells you that the model failed to load, or your kernel version is not upgraded.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Secondly, you may also need to adjust the video memory optimization options, otherwise the image may fail to be drawn (depending on your video memory situation)

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Finally, click the "Add prompt word queue" button to generate the picture.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

4. Test results

I have made a simple comparison of facial painting, hand painting, font writing and typesetting, full-body photos, animals and plants. I used SDXL (with Refiner) and SD3 for the comparison, and controlled the image size to be 1024. However, there are slight differences in some detail parameters (CFG Scale). Please see the VCR below.

Note: The photos on the left are all generated by SDXL, and the photos on the right are all generated by SD3

4.1 Face painting

Prompt words: happy indian girl,portrait photography,beautiful,morning sunlight,smooth light,shot on kodak portra 200,film grain,nostalgic mood,

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: SDXL is slightly inferior, because it looks like a little girl, but the face looks like a middle-aged woman, which is not a good match.

4.2 Hand painting

Prompt words: a girl sitting in the cafe, playing guitar, comic, graphic illustration, comic art, graphic novel art, vibrant, highly detailed, colored, 2d minimalistic

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: SDXL is still a little bit worse, the legs are obviously problematic, the hands of SD3 are slightly better, but not that amazing

4.3 Fonts and Typesetting

Clue provided by SD3: A vibrant street wall covered in colorful graffiti, the centerpiece spells "SD3 MEDIUM", in a storm of colors

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Previous prompt: Epic anime artwork of a wizard atop a mountain at night casting a cosmic spell into the dark sky that says "Stable Diffusion 3" made out of colorful energy

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: From the above two sets of pictures, we can see that the font writing ability of SD3 is indeed much higher than that of SDXL, but it cannot be guaranteed to be perfect.

4.4 Full body photo

Prompt words: a full body portrait of an old hipster man with a ponytail, cigar in mouth, smoke, badass

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: I actually wanted them to draw a full-body photo here. SD3 is a little better for understanding, but I prefer SDXL in terms of style.

Prompt words: Editorial portrait, full body, 1male, dynamic pose, futuristic fashion, cinematic,

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: It feels similar, but SD3 draws the whole body

4.5 Plants

Prompt words: a frozen cosmic rose, the petals glitter with a crystalline shimmer, swirling nebulas, 8k unreal engine photorealism, ethereal lighting, red, nighttime, darkness, surreal art

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

4.6 Anthropomorphic Animals

Prompt words: full body, cat dressed as a Viking, with weapon in his paws, battle coloring, glow hyper-detail, hyper-realism, cinematic

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

4.7 Some rollovers of SD3

You may have seen some cases of SD3 crashing on the Internet. I also encountered this when testing. Sometimes when drawing some people, the drawings do appear strange or even wrong. It is said that this is because of their NSFW Filter(Filter out non-compliant adult content), and judge all human images as NSFW, resulting in the accidental deletion of some harmless adult images.

For example, this group of prompt words: A girl lying on the grass

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Personal opinion: This set of SDXL is obviously better, the neck and hands of SD3 feel weird

There is also this group: A couple lying on the beach in the sun

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

Both sides of this group are obviously not drawn well, and I also found a point that when drawing two people hugging or other actions between them, the hands of the SD model are particularly prone to errors. I think this is a point that they can improve in the future.

This is also a classic case.

Generate pictures with AI and teach you how to deploy Stable Diffusion 3 locally with ComfyUI

There is one last point, which was also discovered during testing, that is, the images generated by SD3 are obviously more vivid, and the exposure is just right, while the images generated by SDXL are darker and a bit underexposed.

From the above test, we can see that the overall effect of SD3 is better than that of SDXL, which is reflected in theDetails, colors, lightingThe exposure will be better.Close to real photos; In addition, the ability to understand fonts, typesetting, and prompts is also stronger, but there are still some shortcomings in some aspects. I hope that it can be continuously optimized in the future.

 

statement:The content is collected from various media platforms such as public websites. If the included content infringes on your rights, please contact us by email and we will deal with it as soon as possible.
Encyclopedia

What is undressing AI? What is AI "one-click undressing"?

2024-6-15 11:58:46

Encyclopedia

What is AI Agent? A review of domestic AI Agent industry applications

2024-6-16 11:05:50

Search