Zero123++ is an advanced image-conditioned diffusion model highlighted in this paper, focusing on generating 3D-consistent multi-view images from a single input view. The model minimizes the effort of fine-tuning by leveraging pre-trained 2D generative priors, particularly from StableDiffusion. Noteworthy improvements include tiling six views into a single image, shifting the noise schedule, implementing scaled reference attention for local conditioning, and introducing FlexDiffuse for global conditioning. Through these enhancements, Zero123++ excels in producing high-quality, consistent multi-view images, overcoming issues like texture degradation and geometric misalignment. The paper showcases the model's effectiveness through qualitative and quantitative comparisons with leading models, emphasizing its potential for various applications. Additionally, a depth-controlled version of Zero123++ is introduced, demonstrating superior performance with ControlNet and highlighting the model's versatility.
3D-GPT is a pioneering framework that simplifies 3D asset modeling in the metaverse era by utilizing large language models (LLMs). Developed collaboratively by teams from the Australian National University, University of Oxford, and Beijing Academy of Artificial Intelligence, 3D-GPT breaks down complex 3D modeling tasks into manageable segments, employing LLMs as adept problem-solvers. The framework consists of three key agents – task dispatch, conceptualization, and modeling – working together to enhance initial scene descriptions and seamlessly integrate procedural generation. Demonstrating reliability and effective collaboration with human designers, 3D-GPT not only streamlines traditional 3D modeling but also integrates smoothly with Blender, expanding manipulation possibilities. This innovative approach underscores the substantial potential of language models in shaping the future of 3D modeling, particularly in scene generation and animation.
Paint3D introduces a groundbreaking generative framework for creating high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes, conditioned on text or image inputs. Addressing the challenge of generating quality textures without embedded illumination information, the method employs a coarse-to-fine approach. It initially utilizes a pre-trained depth-aware 2D diffusion model for multi-view texture fusion, producing a coarse texture map. To overcome incomplete areas and illumination artifacts, separate UV Inpainting and UVHD diffusion models are trained for shape-aware refinement. The resulting process yields high-quality 2K UV textures with semantic consistency, allowing for significant advancements in texturing 3D objects and providing flexibility for re-lighting and editing within modern graphics pipelines.
The Gsgen project introduces a novel approach called Gaussian Splatting based text-to-3D GENeration (Gsgen) that generates high-quality, multi-view consistent 3D assets. Previous methods lacked accurate geometry and fidelity, so Gsgen leverages 3D Gaussian Splatting as a representation technique to address these limitations. Their approach involves a progressive optimization strategy, including a geometry optimization stage and an appearance refinement stage. The geometry optimization establishes a coarse representation under a 3D geometry prior, while the appearance refinement iteratively refines the obtained Gaussians to enhance details. The method proves effective in generating 3D assets with accurate geometry and delicate details
"Magic123" is a two-stage solution for generating high-quality 3D meshes from single images. It uses 2D and 3D priors to optimize a neural radiance field in the first stage, creating a coarse geometry. The second stage utilizes a memory-efficient mesh representation to produce a high-resolution mesh with appealing texture. Through reference view supervision and diffusion priors, the approach generates novel views. The system incorporates a tradeoff parameter for controlling the balance between exploration and precision in the generated geometry.
ControlNet, a neural network structure that adds extra conditions to diffusion models to control them. ControlNet copies the weights of neural network blocks into a "locked" copy and a "trainable" copy, allowing the "trainable" copy to learn the condition while preserving the production-ready diffusion models. The "zero convolution" is used to prevent distortion during training, and no layer is trained from scratch, making it safe to use with small-scale or personal devices. The text also explains how ControlNet can be used with Stable Diffusion to reuse the SD encoder as a powerful backbone for learning diverse controls. The efficacy of the SD encoder as a backbone is validated through various evidences.
Tips for using ControlNet include adding negative prompts and ignoring canvas height/width. The addon have also T2I-Adapter support and experimental features such as CFG-based ControlNet, Guess Mode, and Multi-ControlNet/Joint Conditioning. The weight and guidance strength/start/end are factors that affect the influence of ControlNet on the original SD Unet.
FoldFold allExpandExpand allAre you sure you want to delete this link?Are you sure you want to delete this tag?
The personal, minimalist, super-fast, database free, bookmarking service by the Shaarli community