GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation (doi:10.21979/N9/ZQ85KI)

View:

Part 1: Document Description
Part 2: Study Description
Entire Codebook

(external link)

Document Description

Citation

Title:

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Identification Number:

doi:10.21979/N9/ZQ85KI

Distributor:

DR-NTU (Data)

Date of Distribution:

2025-02-05

Version:

1

Bibliographic Citation:

Lan, Yushi; Zhou, Shangchen; Lyu, Zhaoyang; Hong, Fangzhou; Yang, Shuai; Dai, Bo; Pan, Xingang; Loy, Chen Change, 2025, "GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation", https://doi.org/10.21979/N9/ZQ85KI, DR-NTU (Data), V1

Study Description

Citation

Title:

GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation

Identification Number:

doi:10.21979/N9/ZQ85KI

Authoring Entity:

Lan, Yushi (Nanyang Technological University)

Zhou, Shangchen (Nanyang Technological University)

Lyu, Zhaoyang (Shanghai Artificial Intelligence Laboratory)

Hong, Fangzhou (Nanyang Technological University)

Yang, Shuai (Peking University)

Dai, Bo (Shanghai Artificial Intelligence Laboratory)

Pan, Xingang (Nanyang Technological University)

Loy, Chen Change (Nanyang Technological University)

Software used in Production:

PyTorch

Software used in Production:

Latex

Distributor:

DR-NTU (Data)

Access Authority:

Lan, Yushi

Depositor:

Lan, Yushi

Date of Deposit:

2025-01-24

Holdings Information:

https://doi.org/10.21979/N9/ZQ85KI

Study Scope

Keywords:

Computer and Information Science, Computer and Information Science, 3D Generative Models

Abstract:

While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent diffusion model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single/multi-view image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing methods in both text- and image-conditioned 3D generation.

Kind of Data:

Code and Data

Methodology and Processing

Sources Statement

Data Access

Notes:

S-Lab License 1.0 <br/> Copyright 2025 S-Lab <br/><br/> Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met: <br/> 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. <br/> 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. <br/> 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. <br/><br/> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. <br/><br/> In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.

Other Study Description Materials

Related Studies

<a href="https://nirvanalan.github.io/projects/GA/">https://nirvanalan.github.io/projects/GA/</a>

Related Publications

Citation

Identification Number:

arXiv.2411.08033

Bibliographic Citation:

Lan, Y., Zhou, S., Lyu, Z., Hong, F., Yang, S., Dai, B., ... & Loy, C. C. (2024). GaussianAnything: Interactive Point Cloud Latent Diffusion for 3D Generation. arXiv preprint arXiv:2411.08033.