File size: 4,917 Bytes
aa4fdd4
 
 
 
 
 
6320f4c
aa4fdd4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
title: VAREdit-8B-512
emoji: πŸš€
colorFrom: indigo
colorTo: purple
sdk: gradio
sdk_version: 5.27.0
app_file: app.py
pinned: false
license: mit
models:
- HiDream-ai/VAREdit
---
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# VAREdit

![VAREdit Demo](assets/demo.jpg)

[VAREdit](https://github.com/HiDream-ai/VAREdit) is an advanced image editing model built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models, designed for high-quality instruction-based image editing.

## 🌟 Key Features

- **Strong Instruction Follow**: Follows instructions more accurately due to the autoregressive nature of the model.
- **Efficient Inference**: Optimized for fast generation with less than 1 seconds for 8B model.
- **Flexible Resolution**: Supports 512Γ—512 and 1024Γ—1024 image resolutions
![VAREdit Demo](assets/framework.jpg)

## πŸ“Š Model Variants

| Model Variant    | Resolutions  | HuggingFace Model                                                                 | Time (H800) | VRAM (GB) |
|------------------|--------------|----------------------------------------------------------------------------------|----------|-----------|
| VAREdit-8B-512   | 512Γ—512      | [VAREdit-8B-512](https://huggingface.co/HiDream-ai/VAREdit)         |   ~0.7s   |   50.41     |
| VAREdit-8B-1024  | 1024Γ—1024    | [VAREdit-8B-1024](https://huggingface.co/HiDream-ai/VAREdit)       |   ~1.99s   |   50.41     |

## πŸš€ Quick Start

### Prerequisites

Before starting, ensure you have:
- Python 3.8+
- CUDA-compatible GPU with sufficient VRAM (8GB+ for 2B model, 24GB+ for 8B model)
- Required dependencies installed

### Installation

1. **Clone the repository**
```bash
git clone https://github.com/HiDream-ai/VAREdit.git
cd VAREdit
```

2. **Install dependencies**
```bash
pip install -r requirements.txt
```

3. **Download model checkpoints**

Download the VAREdit model checkpoints:
```bash
# Download from HuggingFace
git lfs install
git clone https://huggingface.co/HiDream-ai/VAREdit
```

### Basic Usage

```python
from infer import load_model, generate_image

model_components = load_model(
    pretrain_root="HiDream-ai/VAREdit",
    model_path="HiDream-ai/VAREdit/8B-1024.pth",
    model_size="8B",
    image_size=1024
)

# Generate edited image
edited_image = generate_image(
    model_components,
    src_img_path="assets/test.jpg",
    instruction="Add glasses to this girl and change hair color to red",
    cfg=3.0,  # Classifier-free guidance scale
    tau=0.1,  # Temperature parameter
    seed=42  # Optional random seed
)
```

## πŸ“ Detailed Configuration

### Model Sampling Parameters

| Parameter | Description | Default |
|-----------|-------------|---------|
| `cfg` | Classifier-free guidance scale | 3.0 |
| `tau` | Temperature for sampling | 1.0 |
| `seed` | Random seed for reproducibility | -1 (random) |

## πŸ“‚ Project Structure

```
VAREdit/
β”œβ”€β”€ infer.py              # Main inference script
β”œβ”€β”€ infinity/             # Core model implementations
β”‚   β”œβ”€β”€ models/          # Model architectures
β”‚   β”œβ”€β”€ dataset/         # Data processing utilities
β”‚   └── utils/           # Helper functions
β”œβ”€β”€ tools/               # Additional tools and scripts
β”‚   └── run_infinity.py  # Model execution utilities
β”œβ”€β”€ assets/              # Demo images and resources
└── README.md           # This file
```

## πŸ“Š Performance Benchmarks
| **Method** | **Size** | **EMU-Edit Bal.** | **PIE-Bench Bal.** | **Time (A800)** |
|:---|:---:|:---:|:---:|:---:|
| InstructPix2Pix | 1.1B | 2.923 | 4.034 | 3.5s |
| UltraEdit | 7.7B | 4.541 | 5.580 | 2.6s |
| OmniGen | 3.8B | 4.674 | 3.492 | 16.5s |
| AnySD | 2.9B | 3.129 | 3.326 | 3.4s |
| EditAR | 0.8B | 3.305 | 4.707 | 45.5s |
| ACE++ | 16.9B | 2.076 | 2.574 | 5.7s |
| ICEdit | 17.0B | 4.785 | 4.933 | 8.4s |
| **VAREdit** (256px) | 2.2B | 5.565 | 6.684 | 0.5s |
| **VAREdit** (512px) | 2.2B | 5.662 | 6.996 | 0.7s |
| **VAREdit** (512px) | 8.4B | 7.7923 | 8.1055 | 1.2s |
| **VAREdit** (1024px) | 8.4B | 7.3797 | 7.6880 | 3.9s |

**Note**: The released 8B models are trained longer and on more data, so the performances are better than that in the paper.

## πŸ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## πŸ“š Citation

If you use VAREdit in your research, please cite:

```bibtex
@article{varedit2025,
  title={Visual Autoregressive Modeling for Instruction-Guided Image Editing},
  author={Mao, Qingyang and Cai, Qi and Li, Yehao and Pan, Yingwei and Cheng, Mingyue and Yao, Ting and Liu, Qi and Mei, Tao},
  journal={arXiv preprint},
  year={2025}
}
```

## πŸ™ Acknowledgments

- Built on the [Infinity](https://huggingface.co/FoundationVision/infinity) models

**Note**: This project is under active development. Features and code may change.