Add Urbandiff

metadriverse · Mar 17, 2024 · 6addf39 · 6addf39
1 parent 2b1a75b
commit 6addf39
Show file tree

Hide file tree

Showing 5 changed files with 139 additions and 0 deletions.
diff --git a/_bibliography/papers.bib b/_bibliography/papers.bib
@@ -1,6 +1,16 @@
 ---
 ---
 
+@inproceedings{urbandiff,
+    abbr={arXiv},
+    title={Urban Scene Diffusion through Semantic Occupancy Map},
+    author={Junge Zhang and Qihang Zhang and Li Zhang and Ramana Rao Kompella and Gaowen Liu and Bolei Zhou},
+    booktitle={Preprint},
+    year={2024},
+    website={https://metadriverse.github.io/urbandiff},
+    teaser={cover_urbandiff.mp4}
+}
+
 @inproceedings{peng2023learning,
   abbr={NeurIPS <b>Spotlight</b>},
   title={Learning from Active Human Involvement through Proxy Value Propagation},

diff --git a/_pages/UrbanDiff.md b/_pages/UrbanDiff.md
@@ -0,0 +1,129 @@
+---
+layout: research
+permalink: /urbandiff/
+title: "Urban Scene Diffusion through Semantic Occupancy Map"
+page_title: "Urban Scene Diffusion through Semantic Occupancy Map"
+authors:
+
+- {name: "Junge Zhang", url: "#", institution: "1,4"}
+- {name: "Qihang Zhang", url: "#", institution: "2"}
+- {name: "Li Zhang", institution: "1"}
+- {name: "Ramana Rao Kompella", institution: "3"}
+- {name: "Gaowen Liu",  institution: "3"}
+- {name: "Bolei Zhou", url: "#", institution: "4"}
+
+institutions:
+
+- {name: "Fudan University", institution: "1"}
+- {name: "The Chinese University of Hong Kong", institution: "2"}
+- {name: "Cisco", institution: "3"}
+- {name: "University of California, Los Angeles", institution: "4"}
+
+
+nav: false
+nav_order: 1
+code_link: https://github.com/Andy-zd/Urbandiff
+pdf_link: 
+
+---
+
+
+
+
+## Overview
+<div class="teaser">
+    <video style="display:block; width:100%; height:auto;" controls autoplay loop>
+                    <source src="https://github.com/Andy-zd/material/releases/download/videos/waymo.mp4" type="video/mp4">
+    </video>
+    <div class="teaser-caption">
+        Fig. 1: Diverse individual scenes generated by UrbanDiffusion.
+    </div>
+</div>
+
+This work presents UrbanDiffusion, a novel 3D diffusion model for generating large-scale urban scenes from Bird's-Eye View (BEV) maps. The model innovatively incorporates both the geometry and semantics of urban structures and objects, extending beyond mere visual representation. It learns the data distribution of scene-level structures within a latent space, thereby facilitating the generation of diverse urban scenes of any scale. Trained on a real-world driving dataset, this model is capable of generating scenes from both held-out BEV maps and synthesized maps from a driving simulator. Furthermore, this work illustrates its applicability in scene image synthesis using a pretrained image generator. 
+
+<!--research-section-splitter-->
+
+## Method
+
+<div class="teaser">
+    <img src="../assets/img/urbandiff/occ_pipeline.pdf">
+    <div class="teaser-caption">
+        Fig. 2: Framework of UrbanDiffusion.
+    </div>
+</div>
+In order to train the diffusion model in a fast and memory efficient way, we choose to follow the latent diffusion model to learn the latent distribution of the 3D data. We thus first embed the 3D semantic data in the space with a lower dimension and then conduct a classifier-free guidance for the diffusion process in this latent feature space. Given a BEV layout, the trained model can generate diverse and realistic samples
+that contain the scene geometry and semantic information through sampling process:
+<div align="center">
+$$
+\hat{\epsilon}_\theta ( z_t |c_{bev})=  \epsilon_{\theta}(z_t|\phi) + w \cdot (\epsilon_{\theta}(z_t|c_{bev})-\epsilon_{\theta}(z_t|\phi)).
+$$
+</div>
+
+
+<!--research-section-splitter-->
+
+
+## Scene Generation
+**Condition on single frame BEV map**
+<br>
+We demonstare the generated samples on BEV maps from different datasets, including nuScenes Validation set, Waymo Motion Dataset, nuPlan Dataset and Metadrive Procedural Generation Map.
+
+<strong>nuScenes Validation set</strong>
+<video style="display:block; width:100%; height:auto;" controls autoplay loop>
+    <source src="https://github.com/Andy-zd/material/releases/download/videos/nusc_val.mp4" type="video/mp4">
+</video>
+
+<strong>nuPlan Dataset</strong>
+<video style="display:block; width:100%; height:auto;" controls autoplay loop>
+    <source src="https://github.com/Andy-zd/material/releases/download/videos/nuplan.mp4" type="video/mp4">
+</video>
+
+<strong>Metadrive Procedural Generation Map</strong>
+<video style="display:block; width:100%; height:auto;" controls autoplay loop>
+    <source src="https://github.com/Andy-zd/material/releases/download/videos/pgmap.mp4" type="video/mp4">
+</video>
+
+**Large-scale Scene Generation**
+
+<video style="display:block; width:100%; height:auto;" controls autoplay loop>
+  <source src="https://github.com/Andy-zd/material/releases/download/videos/large_scene_generation.mp4" type="video/mp4">
+</video>
+<br>
+
+<!--research-section-splitter-->
+
+
+## Scene Synthesis
+**Synthesis on scenes from different dataset**
+<br>
+We perform scene synthesis on different scenes which are sampled from diverse BEV maps:
+
+<video style="display:block; width:100%; height:auto;" controls autoplay loop>
+  <source src="https://github.com/Andy-zd/material/releases/download/videos/short_scene_synthesis.mp4" type="video/mp4">
+</video>
+<br>
+
+<!--research-section-splitter-->
+
+
+
+## Reference
+
+```plain
+@inproceedings{urbandiff,
+    abbr={arXiv},
+    title={Urban Scene Diffusion through Semantic Occupancy Map},
+    author={Junge Zhang and Qihang Zhang and Li Zhang and Ramana Rao Kompella and Gaowen Liu and Bolei Zhou},
+    booktitle={Preprint},
+    year={2024},
+}
+```
+
+<br>
+<!--research-section-splitter-->
+
+## Acknowledgement
+
+This work was supported by the Cisco Faculty Award.
+
diff --git a/assets/img/urbandiff/occ_pipeline.pdf b/assets/img/urbandiff/occ_pipeline.pdf
diff --git a/assets/teaser/cover_urbandiff.mp4 b/assets/teaser/cover_urbandiff.mp4
diff --git a/run.sh b/run.sh