--- base_model: - stabilityai/stable-diffusion-2-1 datasets: - manycore-research/SpatialGen-Testset license: creativeml-openrail-m pipeline_tag: image-to-3d --- # SpatialGen: Layout-guided 3D Indoor Scene Generation

| Image-to-Scene Results | Text-to-Scene Results | | :--------------------------------------: | :----------------------------------------: | | ![Img2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/ksN5t8QEu3Iv6KhpsYsk6.png) | ![Text2Scene](https://cdn-uploads.huggingface.co/production/uploads/6437c0ead38ce48bdd4b0067/waCRa3kp01KAsKgmqS1bb.png) |

TL;DR: Given a 3D semantic layout, SpatialGen can generate a 3D indoor scene conditioned on either a reference image (left) or a textual description (right) using a multi-view, multi-modal diffusion model.

## ✨ News - [Sep, 2025] We released the paper of SpatialGen! - [Aug, 2025] Initial release of SpatialGen-1.0! ## 📋 Release Plan - [x] Provide inference code of SpatialGen. - [ ] Provide training instruction for SpatialGen. - [ ] Release SpatialGen dataset. ## SpatialGen Models

| **Model** | **Download** | | :-----------------------: | -------------------------------------------------------------------------------------| | SpatialGen-1.0 | [🤗 HuggingFace](https://huggingface.co/manycore-research/SpatialGen-1.0) | | FLUX.1-Layout-ControlNet | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) | | FLUX.1-Wireframe-dev-lora | [🤗 HuggingFace](https://huggingface.co/manycore-research/FLUX.1-Wireframe-dev-lora) |

## Usage ### 🔧 Installation Tested with the following environment: * Python 3.10 * PyTorch 2.3.1 * CUDA Version 12.1 ```bash # clone the repository git clone https://github.com/manycore-research/SpatialGen.git cd SpatialGen python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Optional: fix the [flux inference bug](https://github.com/vllm-project/vllm/issues/4392) pip install nvidia-cublas-cu12==12.4.5.8 ``` ### 📊 Dataset We provide [SpatialGen-Testset](https://huggingface.co/datasets/manycore-research/SpatialGen-Testset) with 48 rooms, which labeled with 3D layout and 4.8K rendered images (48 x 100 views, including RGB, normal, depth maps and semantic maps) for MVD inference. ### Inference ```bash # Single image-to-3D Scene bash scripts/infer_spatialgen_i2s.sh # Text-to-image-to-3D Scene # in captions/spatialgen_testset_captions.jsonl, we provide text prompts of different styles for each room, # choose a pair of scene_id and prompt to run the text2scene experiment bash scripts/infer_spatialgen_t2s.sh ``` ## License [SpatialGen-1.0](https://huggingface.co/manycore-research/SpatialGen-1.0) is derived from [Stable-Diffusion-v2.1](https://github.com/Stability-AI/stablediffusion), which is licensed under the [CreativeML Open RAIL++-M License](https://github.com/Stability-AI/stablediffusion/blob/main/LICENSE-MODEL). [FLUX.1-Layout-ControlNet](https://huggingface.co/manycore-research/FLUX.1-Layout-ControlNet) is licensed under the [FLUX.1-dev Non-Commercial License](https://github.com/black-forest-labs/flux/blob/main/model_licenses/LICENSE-FLUX1-dev). ## Acknowledgements We would like to thank the following projects that made this work possible: [DiffSplat](https://github.com/chenguolin/DiffSplat) | [SD 2.1](https://github.com/Stability-AI/stablediffusion) | [TAESD](https://github.com/madebyollin/taesd) | [FLUX](https://github.com/black-forest-labs/flux/) | [SpatialLM](https://github.com/manycore-research/SpatialLM) ## Citation ```bibtex @article{SpatialGen, title = {SpatialGen: Layout-guided 3D Indoor Scene Generation}, author = {Fang, Chuan and Li, Heng and Liang, Yixu and Zheng, Jia and Mao, Yongsen and Liu, Yuan and Tang, Rui and Zhou, Zihan and Tan, Ping}, journal = {arXiv preprint}, year = {2025}, eprint = {2509.14981}, archivePrefix = {arXiv}, primaryClass = {cs.CV} } ```