Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets
The model was presented in the paper Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets.
Abstract
Recent advances in 3D-native generative models have accelerated asset creation for games, film, and design. However, most methods still rely primarily on image or text conditioning and lack fine-grained, cross-modal controls, which limits controllability and practical adoption. To address this gap, we present Hunyuan3D-Omni, a unified framework for fine-grained, controllable 3D asset generation built on Hunyuan3D 2.1. In addition to images, Hunyuan3D-Omni accepts point clouds, voxels, bounding boxes, and skeletal pose priors as conditioning signals, enabling precise control over geometry, topology, and pose. Instead of separate heads for each modality, our model unifies all signals in a single cross-modal architecture. We train with a progressive, difficulty-aware sampling strategy that selects one control modality per example and biases sampling toward harder signals (e.g., skeletal pose) while downweighting easier ones (e.g., point clouds), encouraging robust multi-modal fusion and graceful handling of missing inputs. Experiments show that these additional controls improve generation accuracy, enable geometry-aware transformations, and increase robustness for production workflows.
Hunyuan3D-Omni
Hunyuan3D-Omni is a unified framework for the controllable generation of 3D assets, which inherits the structure of Hunyuan3D 2.1. In contrast, Hunyuan3D-Omni constructs a unified control encoder to introduce additional control signals, including point cloud, voxel, skeleton, and bounding box.
Multi-Modal Conditional Control
- Bounding Box Control: Generate 3D models constrained by 3D bounding boxes
- Pose Control: Create 3D human models with specific skeletal poses
- Point Cloud Control: Generate 3D models guided by input point clouds
- Voxel Control: Create 3D models from voxel representations
π Models Zoo
It takes 10 GB VRAM for generation.
Model | Description | Date | Size | Huggingface |
---|---|---|---|---|
Hunyuan3D-Omni | Image to Shape Model with multi-modal control | 2025-09-25 | 3.3B | Download |
Installation
Requirements
We test our model with Python 3.10.
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
Usage
Inference
Multi-Modal Inference
python inference.py --control_type <control_type> [--use_ema] [--flashvdm]
The control_type
parameter has four available options:
point
: Use point control type for inference.voxel
: Use voxel control type for inference.bbox
: Use bounding box control type for inference.pose
: Use pose control type for inference.
The --use_ema
flag enables the use of Exponential Moving Average (EMA) model for more stable inference.
The --flashvdm
flag enables FlashVDM optimization for faster inference speed.
Please choose the appropriate control_type based on your requirements. For example, if you want to use the point
control type, you can run:
python inference.py --control_type point
python inference.py --control_type point --use_ema
python inference.py --control_type point --flashvdm
Acknowledgements
We would like to thank the contributors to the TripoSG, Trellis, DINOv2, Stable Diffusion, FLUX, diffusers, HuggingFace, CraftsMan3D, Michelangelo, Hunyuan-DiT, HunyuanVideo, HunyuanWorld-1.0, and HunyuanWorld-Voyager repositories, for their open research and exploration.
Citation
If you use this code in your research, please cite:
@misc{hunyuan3d2025hunyuan3domni,
title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2509.21245},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.21245},
}
@misc{hunyuan3d2025hunyuan3d,
title={Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2506.15442},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{hunyuan3d22025tencent,
title={Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation},
author={Tencent Hunyuan3D Team},
year={2025},
eprint={2501.12202},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{yang2024hunyuan3d,
title={Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
author={Tencent Hunyuan3D Team},
year={2024},
eprint={2411.02293},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Star History
- Downloads last month
- 992