VAREdit-8B-512 / README.md
cai-qi's picture
Update README.md
6320f4c verified

A newer version of the Gradio SDK is available: 5.43.1

Upgrade
metadata
title: VAREdit-8B-512
emoji: πŸš€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: mit
models:
  - HiDream-ai/VAREdit

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

VAREdit

VAREdit Demo

VAREdit is an advanced image editing model built on the Infinity models, designed for high-quality instruction-based image editing.

🌟 Key Features

  • Strong Instruction Follow: Follows instructions more accurately due to the autoregressive nature of the model.
  • Efficient Inference: Optimized for fast generation with less than 1 seconds for 8B model.
  • Flexible Resolution: Supports 512Γ—512 and 1024Γ—1024 image resolutions VAREdit Demo

πŸ“Š Model Variants

Model Variant Resolutions HuggingFace Model Time (H800) VRAM (GB)
VAREdit-8B-512 512Γ—512 VAREdit-8B-512 ~0.7s 50.41
VAREdit-8B-1024 1024Γ—1024 VAREdit-8B-1024 ~1.99s 50.41

πŸš€ Quick Start

Prerequisites

Before starting, ensure you have:

  • Python 3.8+
  • CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
  • Required dependencies installed

Installation

  1. Clone the repository
git clone https://github.com/HiDream-ai/VAREdit.git
cd VAREdit
  1. Install dependencies
pip install -r requirements.txt
  1. Download model checkpoints

Download the VAREdit model checkpoints:

# Download from HuggingFace
git lfs install
git clone https://huggingface.co/HiDream-ai/VAREdit

Basic Usage

from infer import load_model, generate_image

model_components = load_model(
    pretrain_root="HiDream-ai/VAREdit",
    model_path="HiDream-ai/VAREdit/8B-1024.pth",
    model_size="8B",
    image_size=1024
)

# Generate edited image
edited_image = generate_image(
    model_components,
    src_img_path="assets/test.jpg",
    instruction="Add glasses to this girl and change hair color to red",
    cfg=3.0,  # Classifier-free guidance scale
    tau=0.1,  # Temperature parameter
    seed=42  # Optional random seed
)

πŸ“ Detailed Configuration

Model Sampling Parameters

Parameter Description Default
cfg Classifier-free guidance scale 3.0
tau Temperature for sampling 1.0
seed Random seed for reproducibility -1 (random)

πŸ“‚ Project Structure

VAREdit/
β”œβ”€β”€ infer.py              # Main inference script
β”œβ”€β”€ infinity/             # Core model implementations
β”‚   β”œβ”€β”€ models/          # Model architectures
β”‚   β”œβ”€β”€ dataset/         # Data processing utilities
β”‚   └── utils/           # Helper functions
β”œβ”€β”€ tools/               # Additional tools and scripts
β”‚   └── run_infinity.py  # Model execution utilities
β”œβ”€β”€ assets/              # Demo images and resources
└── README.md           # This file

πŸ“Š Performance Benchmarks

Method Size EMU-Edit Bal. PIE-Bench Bal. Time (A800)
InstructPix2Pix 1.1B 2.923 4.034 3.5s
UltraEdit 7.7B 4.541 5.580 2.6s
OmniGen 3.8B 4.674 3.492 16.5s
AnySD 2.9B 3.129 3.326 3.4s
EditAR 0.8B 3.305 4.707 45.5s
ACE++ 16.9B 2.076 2.574 5.7s
ICEdit 17.0B 4.785 4.933 8.4s
VAREdit (256px) 2.2B 5.565 6.684 0.5s
VAREdit (512px) 2.2B 5.662 6.996 0.7s
VAREdit (512px) 8.4B 7.7923 8.1055 1.2s
VAREdit (1024px) 8.4B 7.3797 7.6880 3.9s

Note: The released 8B models are trained longer and on more data, so the performances are better than that in the paper.

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š Citation

If you use VAREdit in your research, please cite:

@article{varedit2025,
  title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
  author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
  journal={arXiv preprint},
  year={2025}
}

πŸ™ Acknowledgments

Note: This project is under active development. Features and code may change.