1. Multi-charater image generation with rich motion

2. Model structure preview

π Overview
RaCig is designed to generate images based on textual prompts and reference images for characters (referred to as "Characters"). It leverages several models and techniques, including:
- Text-to-image retrieval (using CLIP)
- IP-Adapter for incorporating reference image features (face and body/clothes)
- ControlNet for pose/skeleton guidance
- Action Direction DINO for action direction recognition
- A pipeline (
RaCigPipeline
) to orchestrate the generation process.
The pipeline can handle multiple characters ("Characters") in a single scene, defined by their names, gender, and reference images (face and clothes).
π¦ Installation
Clone the repository:
git clone https://github.com/ZulutionAI/RaCig.git cd RaCig
Install dependencies:
pip install -r requirements.txt
Download necessary models and retrieval datasets:
Models: https://huggingface.co/ZuluVision/RaCig
Put the models under checkpoint as follow:
./models/ βββ action_direction_dino/ β βββ checkpoint_best_regular.pth βββ controlnet/ β βββ model.safetensors βββ image_encoder/ β βββ config.json β βββ model.safetensors β βββ pytorch_model.bin βββ ipa_weights/ β βββ ip-adapter-plus-face_sdxl_vit-h.bin β βββ ip-adapter-plus_sdxl_vit-h.bin βββ sdxl/ βββ dreamshaper.safetensors
Retrieval datasets: https://huggingface.co/datasets/ZuluVision/RaCig-Data
./data βββ MSDBv2_v7 βββ Reelshot_retrieval βββ retrieve_info
π» Usage
Inference
- Run Inference:
python inference.py
- Generated images, retrieved images, and skeleton visualizations will be saved in the
output/
directory by default. Β·
Gradio
python run_gradio.py
For more detailed instruction, see Gradio Interface Instructions (EN) or Gradio Interface Instructions (δΈζ)
π οΈ Training
We only train the controlnet, to make it recognize the feature map better. (The fused feature map after injecting IP information is quite hard for controlnet to constrain the pose, so we slightly finetune the controlnet)
We use the retrieval dataset to finetune it. The dataset structure is organized as above.
bash train.sh
π€ Contributing
β€οΈ Acknowledgements
This project is based on the work of the following open-source projects and contributors:
- IP-Adapter - Image Prompt Adapter developed by Tencent AI Lab
- xiaohu2015
- Downloads last month
- 0
Model tree for ZuluVision/RaCig
Base model
stabilityai/stable-diffusion-xl-base-1.0