YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis

Ant Group


✨ For more results, visit our Project Page ✨

πŸ“Œ Updates

  • [2025.01.10] πŸ”₯ We release our inference codes and models.
  • [2024.11.29] πŸ”₯ Our paper is in public on arxiv.

πŸ› οΈ Installation

Tested Environment

  • System: Centos 7.2
  • GPU: A100
  • Python: 3.10
  • tensorRT: 8.6.1

Clone the codes from GitHub:

git clone https://github.com/antgroup/ditto-talkinghead
cd ditto-talkinghead

Create conda environment:

conda env create -f environment.yaml
conda activate ditto

πŸ“₯ Download Checkpoints

Download checkpoints from HuggingFace and put them in checkpoints dir:

git lfs install
git clone https://huggingface.co/digital-avatar/ditto-talkinghead checkpoints

The checkpoints should be like:

./checkpoints/
β”œβ”€β”€ ditto_cfg
β”‚   β”œβ”€β”€ v0.4_hubert_cfg_trt.pkl
β”‚   └── v0.4_hubert_cfg_trt_online.pkl
β”œβ”€β”€ ditto_onnx
β”‚   β”œβ”€β”€ appearance_extractor.onnx
β”‚   β”œβ”€β”€ blaze_face.onnx
β”‚   β”œβ”€β”€ decoder.onnx
β”‚   β”œβ”€β”€ face_mesh.onnx
β”‚   β”œβ”€β”€ hubert.onnx
β”‚   β”œβ”€β”€ insightface_det.onnx
β”‚   β”œβ”€β”€ landmark106.onnx
β”‚   β”œβ”€β”€ landmark203.onnx
β”‚   β”œβ”€β”€ libgrid_sample_3d_plugin.so
β”‚   β”œβ”€β”€ lmdm_v0.4_hubert.onnx
β”‚   β”œβ”€β”€ motion_extractor.onnx
β”‚   β”œβ”€β”€ stitch_network.onnx
β”‚   └── warp_network.onnx
└── ditto_trt_Ampere_Plus
    β”œβ”€β”€ appearance_extractor_fp16.engine
    β”œβ”€β”€ blaze_face_fp16.engine
    β”œβ”€β”€ decoder_fp16.engine
    β”œβ”€β”€ face_mesh_fp16.engine
    β”œβ”€β”€ hubert_fp32.engine
    β”œβ”€β”€ insightface_det_fp16.engine
    β”œβ”€β”€ landmark106_fp16.engine
    β”œβ”€β”€ landmark203_fp16.engine
    β”œβ”€β”€ lmdm_v0.4_hubert_fp32.engine
    β”œβ”€β”€ motion_extractor_fp32.engine
    β”œβ”€β”€ stitch_network_fp16.engine
    └── warp_network_fp16.engine
  • The ditto_cfg/v0.4_hubert_cfg_trt_online.pkl is online config
  • The ditto_cfg/v0.4_hubert_cfg_trt.pkl is offline config

πŸš€ Inference

Run inference.py:

python inference.py \
    --data_root "<path-to-trt-model>" \
    --cfg_pkl "<path-to-cfg-pkl>" \
    --audio_path "<path-to-input-audio>" \
    --source_path "<path-to-input-image>" \
    --output_path "<path-to-output-mp4>" 

For example:

python inference.py \
    --data_root "./checkpoints/ditto_trt_Ampere_Plus" \
    --cfg_pkl "./checkpoints/ditto_cfg/v0.4_hubert_cfg_trt.pkl" \
    --audio_path "./example/audio.wav" \
    --source_path "./example/image.png" \
    --output_path "./tmp/result.mp4" 

❗Note:

We have provided the tensorRT model with hardware-compatibility-level=Ampere_Plus (checkpoints/ditto_trt_Ampere_Plus/). If your GPU does not support it, please execute the cvt_onnx_to_trt.py script to convert from the general onnx model (checkpoints/ditto_onnx/) to the tensorRT model.

python script/cvt_onnx_to_trt.py --onnx_dir "./checkpoints/ditto_onnx" --trt_dir "./checkpoints/ditto_trt_custom"

Then run inference.py with --data_root=./checkpoints/ditto_trt_custom.

πŸ“§ Acknowledgement

Our implementation is based on S2G-MDDiffusion and LivePortrait. Thanks for their remarkable contribution and released code! If we missed any open-source projects or related articles, we would like to complement the acknowledgement of this specific work immediately.

βš–οΈ License

This repository is released under the Apache-2.0 license as found in the LICENSE file.

πŸ“š Citation

If you find this codebase useful for your research, please use the following entry.

@article{li2024ditto,
    title={Ditto: Motion-Space Diffusion for Controllable Realtime Talking Head Synthesis},
    author={Li, Tianqi and Zheng, Ruobing and Yang, Minghui and Chen, Jingdong and Yang, Ming},
    journal={arXiv preprint arXiv:2411.19509},
    year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.