Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
5.43.1
metadata
title: VAREdit-8B-512
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: mit
models:
- HiDream-ai/VAREdit
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
VAREdit
VAREdit is an advanced image editing model built on the Infinity models, designed for high-quality instruction-based image editing.
π Key Features
- Strong Instruction Follow: Follows instructions more accurately due to the autoregressive nature of the model.
- Efficient Inference: Optimized for fast generation with less than 1 seconds for 8B model.
- Flexible Resolution: Supports 512Γ512 and 1024Γ1024 image resolutions
π Model Variants
Model Variant | Resolutions | HuggingFace Model | Time (H800) | VRAM (GB) |
---|---|---|---|---|
VAREdit-8B-512 | 512Γ512 | VAREdit-8B-512 | ~0.7s | 50.41 |
VAREdit-8B-1024 | 1024Γ1024 | VAREdit-8B-1024 | ~1.99s | 50.41 |
π Quick Start
Prerequisites
Before starting, ensure you have:
- Python 3.8+
- CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
- Required dependencies installed
Installation
- Clone the repository
git clone https://github.com/HiDream-ai/VAREdit.git
cd VAREdit
- Install dependencies
pip install -r requirements.txt
- Download model checkpoints
Download the VAREdit model checkpoints:
# Download from HuggingFace
git lfs install
git clone https://huggingface.co/HiDream-ai/VAREdit
Basic Usage
from infer import load_model, generate_image
model_components = load_model(
pretrain_root="HiDream-ai/VAREdit",
model_path="HiDream-ai/VAREdit/8B-1024.pth",
model_size="8B",
image_size=1024
)
# Generate edited image
edited_image = generate_image(
model_components,
src_img_path="assets/test.jpg",
instruction="Add glasses to this girl and change hair color to red",
cfg=3.0, # Classifier-free guidance scale
tau=0.1, # Temperature parameter
seed=42 # Optional random seed
)
π Detailed Configuration
Model Sampling Parameters
Parameter | Description | Default |
---|---|---|
cfg |
Classifier-free guidance scale | 3.0 |
tau |
Temperature for sampling | 1.0 |
seed |
Random seed for reproducibility | -1 (random) |
π Project Structure
VAREdit/
βββ infer.py # Main inference script
βββ infinity/ # Core model implementations
β βββ models/ # Model architectures
β βββ dataset/ # Data processing utilities
β βββ utils/ # Helper functions
βββ tools/ # Additional tools and scripts
β βββ run_infinity.py # Model execution utilities
βββ assets/ # Demo images and resources
βββ README.md # This file
π Performance Benchmarks
Method | Size | EMU-Edit Bal. | PIE-Bench Bal. | Time (A800) |
---|---|---|---|---|
InstructPix2Pix | 1.1B | 2.923 | 4.034 | 3.5s |
UltraEdit | 7.7B | 4.541 | 5.580 | 2.6s |
OmniGen | 3.8B | 4.674 | 3.492 | 16.5s |
AnySD | 2.9B | 3.129 | 3.326 | 3.4s |
EditAR | 0.8B | 3.305 | 4.707 | 45.5s |
ACE++ | 16.9B | 2.076 | 2.574 | 5.7s |
ICEdit | 17.0B | 4.785 | 4.933 | 8.4s |
VAREdit (256px) | 2.2B | 5.565 | 6.684 | 0.5s |
VAREdit (512px) | 2.2B | 5.662 | 6.996 | 0.7s |
VAREdit (512px) | 8.4B | 7.7923 | 8.1055 | 1.2s |
VAREdit (1024px) | 8.4B | 7.3797 | 7.6880 | 3.9s |
Note: The released 8B models are trained longer and on more data, so the performances are better than that in the paper.
π License
This project is licensed under the MIT License - see the LICENSE file for details.
π Citation
If you use VAREdit in your research, please cite:
@article{varedit2025,
title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
journal={arXiv preprint},
year={2025}
}
π Acknowledgments
- Built on the Infinity models
Note: This project is under active development. Features and code may change.