File size: 3,830 Bytes
b78ed69 fa85ab7 b78ed69 fa85ab7 b78ed69 fa85ab7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 |
---
license: creativeml-openrail-m
datasets:
- manycore-research/SpatialGen-Testset
base_model:
- stabilityai/stable-diffusion-2-1
pipeline_tag: image-to-image
---
# SpatialGen
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<picture>
<source srcset="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/myrWYVNd4m-DuxV39VQZ0.png" media="(prefers-color-scheme: dark)">
<img src="https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/QQvDtmokH4ZjwH0wppqFC.png" width="60%" alt="SpatialLM""/>
</picture>
</div>
<hr style="margin-top: 0; margin-bottom: 8px;">
<div align="center" style="margin-top: 0; padding-top: 0; line-height: 1;">
<a href="https://github.com/manycore-research/SpatialGen" target="_blank" style="margin: 2px;"><img alt="GitHub"
src="https://img.shields.io/badge/GitHub-SpatialGen-24292e?logo=github&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
<a href="https://huggingface.co/manycore-research/SpatialGen-1.0" target="_blank" style="margin: 2px;"><img alt="Hugging Face"
src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-SpatialGen-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/></a>
</div>
<div align="center">
| Image-to-Scene Results | Text-to-Scene Results |
| :--------------------: | :-------------------: |
|  |  |
<p>SpatialGen produces multi-view, multi-modal information from a semantic layout using a multi-view, multi-modal diffusion model.</p>
</div>
## ✨ News
- [Aug, 2025] Initial release of SpatialGen-1.0!
## SpatialGen Models
<div align="center">
| **Model** | **Download** |
| :-------------: | -------------------------------------------------------------------------- |
| SpatialGen-1.0 | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0) |
</div>
## Usage
### 🔧 Installation
Tested with the following environment:
* Python 3.10
* PyTorch 2.3.1
* CUDA Version 12.1
```bash
# clone the repository
git clone https://github.com/manycore-research/SpatialGen.git
cd SpatialGen
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392)
pip install nvidia-cublas-cu12==12.4.5.8
```
### 📊 Dataset
We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference.
### Inference
```bash
# Single image-to-3D Scene
bash scripts/infer_spatialgen_i2s.sh
# Text-to-image-to-3D Scene
bash scripts/infer_spatialgen_t2s.sh
```
## License
[SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL).
## Acknowledgements
We would like to thank the following projects that made this work possible:
[DiffSplat](https://github.com/chenguolin/DiffSplat) | [SD 2.1](https://github.com/Stability-AI/stablediffusion) | [TAESD](https://github.com/madebyollin/taesd) | [SpatialLM](https://github.com/manycore-research/SpatialLM)
|