lilelife commited on
Commit
8a14c73
β€’
1 Parent(s): 5dbd8e7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +162 -3
README.md CHANGED
@@ -1,3 +1,162 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SyntheOcc
2
+
3
+ > SyntheOcc: Synthesize Geometric-Controlled Street View Images through 3D Semantic MPIs <br>
4
+ > [Leheng Li](https://len-li.github.io), Weichao Qiu, Yingjie Cai, Xu Yan, Qing Lian, Bingbing Liu, Ying-Cong Chen
5
+
6
+ SyntheOcc is a project focused on synthesizing image data under geometry control (occupancy voxel). This repository provides tools and scripts to process, train, and generate synthetic image data in the nuScenes dataset, using occupancy control.
7
+ #### [Project Page](https://len-li.github.io/syntheocc-web) | [Paper](https://arxiv.org/) | [Video](https://len-li.github.io/syntheocc-web/videos/teaser-occedit.mp4) | [Checkpoint](https://huggingface.co/lilelife/SyntheOcc)
8
+
9
+ Code: https://github.com/EnVision-Research/SyntheOcc
10
+
11
+ ## Table of Contents
12
+
13
+ - [Installation](#installation)
14
+ - [Prepare Dataset](#prepare-dataset)
15
+ - [Prepare Checkpoint](#prepare-checkpoint)
16
+ - [Train](#train)
17
+ - [Inference](#inference)
18
+
19
+
20
+
21
+
22
+ ## Installation
23
+
24
+ To get started with SyntheOcc, follow these steps:
25
+
26
+ 1. **Clone the repository:**
27
+ ```bash
28
+ git clone https://github.com/Len-Li/SyntheOcc.git
29
+ cd SyntheOcc
30
+ ```
31
+
32
+ 2. **Set up a environment :**
33
+ ```bash
34
+ pip install torch torchvision transformers
35
+ pip install diffusers==0.26.0.dev0
36
+ # We use a old version of diffusers, please take care of it.
37
+ ```
38
+
39
+
40
+
41
+
42
+ ## Prepare Dataset
43
+
44
+ To use SyntheOcc, follow the steps below:
45
+
46
+ 1. **Download the NuScenes dataset:**
47
+ - Register and download the dataset from the [NuScenes website](https://www.nuscenes.org/nuscenes).
48
+ - Download the [train](https://github.com/JeffWang987/OpenOccupancy/releases/tag/train_pkl)/[val](https://github.com/JeffWang987/OpenOccupancy/releases/tag/val_pkl) pickle files from OpenOccupancy and put them in `data/nuscenes` folder.
49
+
50
+
51
+
52
+ After preparation, you will be able to see the following directory structure:
53
+
54
+ ```
55
+ SyntheOcc/
56
+ β”œβ”€β”€ data/
57
+ β”‚ β”œβ”€β”€ nuscenes/
58
+ β”‚ β”‚ β”œβ”€β”€ samples/
59
+ β”‚ β”‚ β”œβ”€β”€ sweeps/
60
+ | | β”œβ”€β”€ v1.0-trainval/
61
+ | | β”œβ”€β”€ nuscenes_occ_infos_train.pkl
62
+ | | β”œβ”€β”€ nuscenes_occ_infos_val.pkl
63
+ ```
64
+ 2. **Download occupancy annotation from [SurroundOcc](https://github.com/weiyithu/SurroundOcc/blob/main/docs/data.md)**
65
+
66
+ You need to generate the high resolution 0.2m occupancy from mesh vertices and put them in `data/nuscenes` folder.
67
+
68
+ You can also download the 0.5m occupancy. The precision may be limited compared with 0.2m.
69
+
70
+
71
+ 3. **Run the script to convert occupancy to 3D semantic multiplane images:**
72
+ ```bash
73
+ torchrun utils/gen_mtp.py
74
+ ```
75
+ It will generate the 3D semantic MPI and save them in `data/nuscenes/samples_syntheocc_surocc/` folder.
76
+
77
+ ## Prepare Checkpoint
78
+ Our model is based on [stable-diffusion-v2-1](https://huggingface.co/stabilityai/stable-diffusion-v2-1). Please put them at `./SyntheOcc/ckp/`.
79
+
80
+ Our checkpoint of SyntheOcc is released in [huggingface](https://huggingface.co/lilelife/SyntheOcc). If you want to use our model to run inference. Please also put them at `./SyntheOcc/ckp/`.
81
+
82
+ ## Train
83
+
84
+ ```bash
85
+ bash train.sh
86
+ ```
87
+ The details of the script are as follows:
88
+ ```bash
89
+ export WANDB_DISABLED=True
90
+ export HF_HUB_OFFLINE=True
91
+
92
+ export MODEL_DIR="./ckp/stable-diffusion-v2-1"
93
+
94
+ export EXP_NAME="train_syntheocc"
95
+ export OUTPUT_DIR="./ckp/$EXP_NAME"
96
+ export SAVE_IMG_DIR="vis_dir/$EXP_NAME/samples"
97
+ export DATA_USED="samples_syntheocc_surocc"
98
+
99
+ accelerate launch --gpu_ids 0, --num_processes 1 --main_process_port 3226 train.py \
100
+ --pretrained_model_name_or_path=$MODEL_DIR \
101
+ --output_dir=$OUTPUT_DIR \
102
+ --width=800 \
103
+ --height=448 \
104
+ --learning_rate=2e-5 \
105
+ --num_train_epochs=6 \
106
+ --train_batch_size=1 \
107
+ --mixed_precision="fp16" \
108
+ --num_validation_images=2 \
109
+ --validation_steps=1000 \
110
+ --checkpointing_steps=5000 \
111
+ --checkpoints_total_limit=10 \
112
+ --ctrl_channel=257 \
113
+ --enable_xformers_memory_efficient_attention \
114
+ --report_to='wandb' \
115
+ --use_cbgs=True \
116
+ --mtp_path='samples_syntheocc_surocc' \
117
+ --resume_from_checkpoint="latest"
118
+ ```
119
+
120
+ The training process will take 1~2 days to complete, depending on the hardware. We use a fixed batchsize=1, image resolution = (800, 448), which will take 25GB memory for each GPU.
121
+
122
+ ## Inference
123
+
124
+ ```bash
125
+ bash infer.sh
126
+ ```
127
+ You will find generated images at `./ckp/$EXP_NAME/samples`. The image is shown as follows:
128
+ ![image](./ckp/demo.jpg)
129
+
130
+
131
+
132
+ ## Acknowledgment
133
+ Additionally, we express our gratitude to the authors of the following opensource projects:
134
+
135
+ - [SurroundOcc](https://github.com/weiyithu/SurroundOcc) (Occupancy annotation)
136
+ - [OpenOccupancy](https://github.com/JeffWang987/OpenOccupancy) (Occupancy annotation)
137
+ - [MagicDrive](https://github.com/cure-lab/MagicDrive) (Cross-view and cross-frame attention implementation)
138
+ - [Diffusers controlnet example](https://github.com/huggingface/diffusers/tree/main/examples/controlnet) (Diffusion model implementation)
139
+
140
+
141
+
142
+
143
+
144
+ ## BibTeX
145
+
146
+ ```bibtex
147
+ @inproceedings{li2024SyntheOcc,
148
+ title={SyntheOcc: Synthesize Geometric Controlled Street View Images through 3D Semantic MPIs},
149
+ author={Li, Leheng and Qiu, Weichao and Chen, Ying-Cong et.al.},
150
+ booktitle={arxiv preprint},
151
+ year={2024}
152
+ }
153
+ ```
154
+
155
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
156
+
157
+
158
+
159
+
160
+ ---
161
+ license: mit
162
+ ---