File size: 10,086 Bytes
87c126b c1d9c6a 87c126b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 |
<div align="center">
<h1>
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
</h1>
<div>
<a href='https://github.com/NIRVANALAN' target='_blank'>Yushi Lan</a><sup>1</sup> 
<a href='https://hongfz16.github.io' target='_blank'>Fangzhou Hong</a><sup>1</sup> 
<a href='https://williamyang1991.github.io/' target='_blank'>Shuai Yang</a><sup>2</sup> 
<a href='https://shangchenzhou.com/' target='_blank'>Shangchen Zhou</a><sup>1</sup> 
<a href='https://sg.linkedin.com/in/xuyi-meng-673779208' target='_blank'>Xuyi Meng</a><sup>1</sup> 
<br>
<a href='https://xingangpan.github.io/' target='_blank'>Xingang Pan</a>
<sup>1</sup>
<a href='https://daibo.info/' target='_blank'>Bo Dai</a>
<sup>3</sup>
<a href='https://www.mmlab-ntu.com/person/ccloy/' target='_blank'>Chen Change Loy</a>
<sup>1</sup>  
</div>
<div>
S-Lab, Nanyang Technological University<sup>1</sup>;
<!--   -->
<br>
Wangxuan Institute of Computer Technology, Peking University<sup>2</sup>;
<br>
<!--   -->
Shanghai Artificial Intelligence Laboratory <sup>3</sup>
<!-- <br>
<sup>*</sup>corresponding author -->
</div>
<div>
<!-- <a target="_blank" href="https://colab.research.google.com/github/nirvanalan/E3DGE/blob/main/notebook/CVPR23_E3DGE_Demo.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a> -->
<a href="https://hits.seeyoufarm.com"><img src="https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FNIRVANALAN%2FLN3Diff&count_bg=%2379C83D&title_bg=%23555555&icon=&icon_color=%23E7E7E7&title=hits&edge_flat=false"/></a>
</div>
<br>
<!-- <h4> -->
<strong>
LN3Diff is a feedforward 3D diffusion model that creates high-quality 3D object mesh from text within 8 V100-SECONDS.
</strong>
<!-- </h4> -->
<table>
<tr></tr>
<tr>
<td>
<img src="assets/t23d/standing-hund.gif">
</td>
<td>
<img src="assets/t23d/ufo.gif">
</td>
<td>
<img src="assets/t23d/mast.gif">
</td>
<td>
<img src="assets/t23d/cannon.gif">
</td>
<td>
<img src="assets/t23d/blue-plastic-chair.gif">
</td>
</tr>
<tr>
<td align='center' width='20%'>A standing hund.</td>
<td align='center' width='20%'>An UFO space aircraft.</td>
<td align='center' width='20%'>A sailboat with mast.</td>
<td align='center' width='20%'>An 18th century cannon.</td>
<td align='center' width='20%'>A blue plastic chair.</td>
</tr>
<tr></tr>
</table>
<!-- <br> -->
For more visual results, go checkout our <a href="https://nirvanalan.github.io/projects/ln3diff/" target="_blank">project page</a> :page_with_curl:
<strike>
Codes coming soon :facepunch:
</strike>
This repository contains the official implementation of LN3Diff:
Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
</div>
---
<h4 align="center">
<a href="https://nirvanalan.github.io/projects/ln3diff/" target='_blank'>[Project Page]</a>
β’
<a href="https://arxiv.org/pdf/2403.12019.pdf" target='_blank'>[arXiv]</a>
</h4>
## :mega: Updates
[03/2024] Initial release.
[04/2024] Inference and training codes on Objaverse, ShapeNet and FFHQ are released, including pre-trained model and training dataset.
## :dromedary_camel: TODO
- [x] Release the inference and training code.
- [x] Release the pre-trained checkpoints of ShapeNet and FFHQ.
- [x] Release the pre-trained checkpoints of T23D Objaverse model trained with 30K+ instances dataset.
- [x] Release the stage-1 VAE of Objaverse trained with 80K+ instances dataset.
- [ ] Add Gradio demo.
- [ ] Polish the dataset preparation and training doc.
- [ ] add metrics evaluation scripts and samples.
- [ ] Lint the code.
- [ ] Release the new T23D Objaverse model trained with 80K+ instances dataset.
## :handshake: Citation
If you find our work useful for your research, please consider citing the paper:
```
@misc{lan2024ln3diff,
title={LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation},
author={Yushi Lan and Fangzhou Hong and Shuai Yang and Shangchen Zhou and Xuyi Meng and Bo Dai and Xingang Pan and Chen Change Loy},
year={2024},
eprint={2403.12019},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## :desktop_computer: Requirements
NVIDIA GPUs are required for this project.
We conduct all the training on NVIDIA V100-32GiB (ShapeNet, FFHQ) and NVIDIA A100-80GiB (Objaverse).
We have test the inference codes on NVIDIA V100.
We recommend using anaconda to manage the python environments.
The environment can be created via ```conda env create -f environment_ln3diff.yml```, and activated via ```conda activate ln3diff```.
If you want to reuse your own PyTorch environment, install the following packages in your environment:
```
# first, check whether you have installed pytorch (>=2.0) and xformer.
conda install -c conda-forge openexr-python git
pip install openexr lpips imageio kornia opencv-python tensorboard tqdm timm ffmpeg einops beartype imageio[ffmpeg] blobfile ninja lmdb webdataset opencv-python click torchdiffeq transformers
pip install git+https://github.com/nupurkmr9/vision-aided-gan.
```
## :running_woman: Inference
### Download Models
The pretrained stage-1 VAE and stage-2 LDM can be downloaded via [OneDrive](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/ErdRV9hCYvlBioObT1v_LZ4Bnwye3sv6p5qiVZPNhI9coQ?e=nJgp8t).
Put the downloaded checkpoints under ```checkpoints``` folder for inference. The checkpoints directory layout should be
checkpoints
βββ ffhq
β βββ model_joint_denoise_rec_model1580000.pt
βββ objaverse
β βββ model_rec1680000.pt
β βββ model_joint_denoise_rec_model2310000.pt
βββ shapenet
β βββ car
β βββ model_joint_denoise_rec_model1580000.pt
β βββ chair
β βββ model_joint_denoise_rec_model2030000.pt
β βββ plane
β βββ model_joint_denoise_rec_model770000.pt
βββ ...
### Inference Commands
<strong>Note that to extract the mesh, 24GiB VRAM is required.</strong>
#### Stage-1 VAE 3D reconstruction
For (Objaverse) stage-1 VAE 3D reconstruction and extract VAE latents for diffusion learning, please run
```bash
bash shell_scripts/final_release/inference/sample_obajverse.sh
```
which shall give the following result:
The marching-cube extracted mesh can be visualized with Blender/MeshLab:
<img title="a title" alt="Mesh Visualization" src="./assets/stage1_vae_reconstruction/reconstruction_result/mesh-visualization.png">
**We upload the pre-extracted vae latents at [here](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/EnXixldDrKhDtrcuPM4vjQYBv06uY58F1mF7f7KVdZ19lQ?e=nXQNdm), which contains the correponding VAE latents (with shape 32x32x12) of 76K G-buffer Objaverse objects. Feel free to use them in your own task.**
For more G-buffer Objaverse examples, download the [demo data](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/EoyzVJbMyBhLoKFJbbsq6bYBi1paLwQxIDjTkO1KjI4b1g?e=sJc3rQ).
#### Stage-2 Text-to-3D
We train 3D latent diffusion model on top of the stage-1 extracted latents.
For the following bash inference file, to extract mesh from the generated tri-plane, set ```--export_mesh True```. To change the text prompt, set the ```prompt``` variable. For unconditional sampling, set the cfg guidance ```unconditional_guidance_scale=0```. Feel free to tune the cfg guidance scale to trade off diversity and fidelity.
Note that the diffusion sampling batch size is set to ```4```, which costs around 16GiB VRAM. The mesh extraction of a single instance costs 24GiB VRAM.
For text-to-3D on Objaverse, run
```bash
bash shell_scripts/final_release/inference/sample_obajverse.sh
```
For text-to-3D on ShapeNet, run one of the following commands (which conducts T23D on car, chair and plane.):
```bash
bash shell_scripts/final_release/inference/sample_shapenet_car_t23d.sh
```
```bash
bash shell_scripts/final_release/inference/sample_shapenet_chair_t23d.sh
```
```bash
bash shell_scripts/final_release/inference/sample_shapenet_plane_t23d.sh
```
For text-to-3D on FFHQ, run
```bash
bash shell_scripts/final_release/inference/sample_ffhq_t23d.sh
```
## :running_woman: Training
### Dataset
For Objaverse, we use the rendering provided by [G-buffer Objaverse](https://aigc3d.github.io/gobjaverse/). A demo subset for stage-1 VAE reconstruction can be downloaded from [here](https://entuedu-my.sharepoint.com/:u:/g/personal/yushi001_e_ntu_edu_sg/Eb6LX2x-EgJLpiHbhRxsN9ABnEaSyjG-tsVBcUr_dQ5dnQ?e=JXWQo1). Note that for Objaverse training, we pre-process the raw data into [wds-dataset](https://github.com/webdataset/webdataset) shards for fast and flexible loading. The sample shard data can be found in [here](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/ErtZQgnEH5ZItDqdUaiVbJgBe4nhZveJemQRqDW6Xwp7Zg?e=Zqt6Ss).
For ShapeNet, we render our own data with foreground mask for training, which can be downloaded from [here](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/EijBXIC_bUNOo0L3wnJKRqoBCqVnhhT_BReYRc1tc_0lrA?e=VQwWOZ). For training, we convert the raw data to LMDB for faster data loading. The pre-processed LMDB file can be downloaded from [here](https://entuedu-my.sharepoint.com/:f:/g/personal/yushi001_e_ntu_edu_sg/Ev7L8Als8K9JtLtj1G23Cc0BTNDbhCQPadxNLLVS7mV2FQ?e=C5woyE).
For FFHQ, we use the pre-processed dataset from [EG3D](https://github.com/NVlabs/eg3d) and compress it into LMDB, which can also be found in the onedrive link above.
### Training Commands
Coming soon.
## :newspaper_roll: License
Distributed under the S-Lab License. See `LICENSE` for more information.
## Contact
If you have any question, please feel free to contact us via `[email protected]` or Github issues. |