Spaces:
Running
on
Zero
Running
on
Zero
File size: 4,917 Bytes
aa4fdd4 6320f4c aa4fdd4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
---
title: VAREdit-8B-512
emoji: π
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: mit
models:
- HiDream-ai/VAREdit
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
# VAREdit

[VAREdit](https://github.com/HiDream-ai/VAREdit) is an advanced image editing model built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models, designed for high-quality instruction-based image editing.
## π Key Features
- **Strong Instruction Follow**: Follows instructions more accurately due to the autoregressive nature of the model.
- **Efficient Inference**: Optimized for fast generation with less than 1 seconds for 8B model.
- **Flexible Resolution**: Supports 512Γ512 and 1024Γ1024 image resolutions

## π Model Variants
| Model Variant | Resolutions | HuggingFace Model | Time (H800) | VRAM (GB) |
|------------------|--------------|----------------------------------------------------------------------------------|----------|-----------|
| VAREdit-8B-512 | 512Γ512 | [VAREdit-8B-512](https://huggingface.co/HiDream-ai/VAREdit) | ~0.7s | 50.41 |
| VAREdit-8B-1024 | 1024Γ1024 | [VAREdit-8B-1024](https://huggingface.co/HiDream-ai/VAREdit) | ~1.99s | 50.41 |
## π Quick Start
### Prerequisites
Before starting, ensure you have:
- Python 3.8+
- CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
- Required dependencies installed
### Installation
1. **Clone the repository**
```bash
git clone https://github.com/HiDream-ai/VAREdit.git
cd VAREdit
```
2. **Install dependencies**
```bash
pip install -r requirements.txt
```
3. **Download model checkpoints**
Download the VAREdit model checkpoints:
```bash
# Download from HuggingFace
git lfs install
git clone https://huggingface.co/HiDream-ai/VAREdit
```
### Basic Usage
```python
from infer import load_model, generate_image
model_components = load_model(
pretrain_root="HiDream-ai/VAREdit",
model_path="HiDream-ai/VAREdit/8B-1024.pth",
model_size="8B",
image_size=1024
)
# Generate edited image
edited_image = generate_image(
model_components,
src_img_path="assets/test.jpg",
instruction="Add glasses to this girl and change hair color to red",
cfg=3.0, # Classifier-free guidance scale
tau=0.1, # Temperature parameter
seed=42 # Optional random seed
)
```
## π Detailed Configuration
### Model Sampling Parameters
| Parameter | Description | Default |
|-----------|-------------|---------|
| `cfg` | Classifier-free guidance scale | 3.0 |
| `tau` | Temperature for sampling | 1.0 |
| `seed` | Random seed for reproducibility | -1 (random) |
## π Project Structure
```
VAREdit/
βββ infer.py # Main inference script
βββ infinity/ # Core model implementations
β βββ models/ # Model architectures
β βββ dataset/ # Data processing utilities
β βββ utils/ # Helper functions
βββ tools/ # Additional tools and scripts
β βββ run_infinity.py # Model execution utilities
βββ assets/ # Demo images and resources
βββ README.md # This file
```
## π Performance Benchmarks
| **Method** | **Size** | **EMU-Edit Bal.** | **PIE-Bench Bal.** | **Time (A800)** |
|:---|:---:|:---:|:---:|:---:|
| InstructPix2Pix | 1.1B | 2.923 | 4.034 | 3.5s |
| UltraEdit | 7.7B | 4.541 | 5.580 | 2.6s |
| OmniGen | 3.8B | 4.674 | 3.492 | 16.5s |
| AnySD | 2.9B | 3.129 | 3.326 | 3.4s |
| EditAR | 0.8B | 3.305 | 4.707 | 45.5s |
| ACE++ | 16.9B | 2.076 | 2.574 | 5.7s |
| ICEdit | 17.0B | 4.785 | 4.933 | 8.4s |
| **VAREdit** (256px) | 2.2B | 5.565 | 6.684 | 0.5s |
| **VAREdit** (512px) | 2.2B | 5.662 | 6.996 | 0.7s |
| **VAREdit** (512px) | 8.4B | 7.7923 | 8.1055 | 1.2s |
| **VAREdit** (1024px) | 8.4B | 7.3797 | 7.6880 | 3.9s |
**Note**: The released 8B models are trained longer and on more data, so the performances are better than that in the paper.
## π License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
## π Citation
If you use VAREdit in your research, please cite:
```bibtex
@article{varedit2025,
title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
journal={arXiv preprint},
year={2025}
}
```
## π Acknowledgments
- Built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models
**Note**: This project is under active development. Features and code may change. |