TencentRelease against Tencentorigin of the universeVincent FigureOpen Source Big Model(hybrid DiT) acceleration library, which claims to dramatically improve inference efficiency and reduce graph generation time by 75%.
Officially, the threshold of using the hybrid DiT model has also been greatly reduced, and users can use the Tencent hybrid Vincennes model capability based on ComfyUI's graphical interface. At the same time, the hybrid DiT model has been deployed to the HuggingFaceDiffusers universal model library, users can call the hybrid DiT model with only three lines of code, without having to download the original code base.
Previously, Tencent announced that the DiT model has been fully upgraded and open-sourced to the public for free commercial use by enterprises and individual developers. Tencent called it the "industry's first" Chinese native open source model of the DiT architecture, supporting bilingual input and comprehension in English and Chinese. It adopts the same DiT architecture as sora, which not only supports text-to-graph, but also serves as the basis for multimodal visualization such as video.
CUDA-enabled NVIDIA GPUs are required to run the model, with a minimum of 11GB of video memory required to run Hybrid DiT alone, and at least 32GB of video memory required to run DialogGen (Tencent's text-to-image multimodal interactive dialog system) and Hybrid DiT together, which Tencent says they have tested with NVIDIA's V100 and A100 on Linux. GPUs on Linux.