ShanghaiTech UniversityScientists have recently developed aCLAYofArtificial Intelligence ModelCLAY is a machine learning model that can generate detailed 3D objects from text descriptions or 2D images. Compared with previous technologies, CLAY has achieved significant breakthroughs in the quality and diversity of generated 3D objects.
The core of the CLAY model consists of a multi-resolution variational autoencoder (VAE) and a diffusion transformer (DiT). The VAE is responsible for encoding 3D geometries at different levels of detail into the latent space, while the DiT is responsible for generating these geometries. Unlike many other systems, CLAY is able to process 3D content directly without converting it to 2D images first.
CLAY's training data exceeds 500,000 3D models, covering a variety of objects from simple everyday objects to complex fantasy creatures. In addition, CLAY has the ability to be controlled by additional inputs, and users can achieve precise control over the generated results by specifying rough shapes (such as voxel structures, point clouds) or bounding boxes. This flexibility allows CLAY to generate entire urban scenes and even reconstruct detailed 3D models from hand-drawn sketches.
When compared with other systems (such as Shap-E, DreamFusion, and Wonder3D), CLAY shows clear advantages.Whether it is text to 3D or image to 3D, CLAY can generate more consistent geometry, smoother surfaces and finer details.CLAY also generates high-quality 3D assets incredibly quickly, in just about 45 seconds, whereas some comparable systems may take hours to optimize.
The potential applications of CLAY are wide-ranging, including areas such as game development, filmmaking, and 3D printing. Still, the researchers are aware of the potential risks of AI-generated virtual content, so they plan to add more safety measures to ensure responsible use.
In the future, the researchers also plan to further expand the training data, improve the model quality, and integrate geometry generation and material synthesis into a single model to achieve more comprehensive capabilities. A version of CLAY can be accessed through the 3D-Gen service Rodin.
Product portal: https://hyperhuman.deemos.com/rodin