Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets

The model was presented in the paper Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets.

Abstract

Recent advances in 3D-native generative models have accelerated asset creation for games, film, and design. However, most methods still rely primarily on image or text conditioning and lack fine-grained, cross-modal controls, which limits controllability and practical adoption. To address this gap, we present Hunyuan3D-Omni, a unified framework for fine-grained, controllable 3D asset generation built on Hunyuan3D 2.1. In addition to images, Hunyuan3D-Omni accepts point clouds, voxels, bounding boxes, and skeletal pose priors as conditioning signals, enabling precise control over geometry, topology, and pose. Instead of separate heads for each modality, our model unifies all signals in a single cross-modal architecture. We train with a progressive, difficulty-aware sampling strategy that selects one control modality per example and biases sampling toward harder signals (e.g., skeletal pose) while downweighting easier ones (e.g., point clouds), encouraging robust multi-modal fusion and graceful handling of missing inputs. Experiments show that these additional controls improve generation accuracy, enable geometry-aware transformations, and increase robustness for production workflows.

Hunyuan3D-Omni

Hunyuan3D-Omni is a unified framework for the controllable generation of 3D assets, which inherits the structure of Hunyuan3D 2.1. In contrast, Hunyuan3D-Omni constructs a unified control encoder to introduce additional control signals, including point cloud, voxel, skeleton, and bounding box.

Multi-Modal Conditional Control

Bounding Box Control: Generate 3D models constrained by 3D bounding boxes
Pose Control: Create 3D human models with specific skeletal poses
Point Cloud Control: Generate 3D models guided by input point clouds
Voxel Control: Create 3D models from voxel representations

🎁 Models Zoo

It takes 10 GB VRAM for generation.

Model	Description	Date	Size	Huggingface
Hunyuan3D-Omni	Image to Shape Model with multi-modal control	2025-09-25	3.3B	Download

Installation

Requirements

We test our model with Python 3.10.

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

Usage

Inference

Multi-Modal Inference

python inference.py --control_type <control_type> [--use_ema] [--flashvdm]

The control_type parameter has four available options:

point: Use point control type for inference.
voxel: Use voxel control type for inference.
bbox: Use bounding box control type for inference.
pose: Use pose control type for inference.

The --use_ema flag enables the use of Exponential Moving Average (EMA) model for more stable inference.

The --flashvdm flag enables FlashVDM optimization for faster inference speed.

Please choose the appropriate control_type based on your requirements. For example, if you want to use the point control type, you can run:

python inference.py --control_type point 
python inference.py --control_type point --use_ema
python inference.py --control_type point --flashvdm

Acknowledgements

We would like to thank the contributors to the TripoSG, Trellis, DINOv2, Stable Diffusion, FLUX, diffusers, HuggingFace, CraftsMan3D, Michelangelo, Hunyuan-DiT, HunyuanVideo, HunyuanWorld-1.0, and HunyuanWorld-Voyager repositories, for their open research and exploration.

Citation

If you use this code in your research, please cite:

@misc{hunyuan3d2025hunyuan3domni,
      title={Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets}, 
      author={Tencent Hunyuan3D Team},
      year={2025},
      eprint={2509.21245},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.21245}, 
}
@misc{hunyuan3d2025hunyuan3d,
    title={Hunyuan3D 2.1: From Images to High-Fidelity 3D Assets with Production-Ready PBR Material},
    author={Tencent Hunyuan3D Team},
    year={2025},
    eprint={2506.15442},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

@misc{hunyuan3d22025tencent,
    title={Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D Assets Generation},
    author={Tencent Hunyuan3D Team},
    year={2025},
    eprint={2501.12202},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

@misc{yang2024hunyuan3d,
    title={Hunyuan3D 1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
    author={Tencent Hunyuan3D Team},
    year={2024},
    eprint={2411.02293},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Star History

Downloads last month: 1,455

Inference Providers NEW

Image-to-3D

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tencent/Hunyuan3D-Omni

Hunyuan3D

Collection

24 items • Updated 18 days ago • 29