ziyun.zeng
commited on
Commit
·
b43620c
1
Parent(s):
2008b8d
update README
Browse files
README.md
CHANGED
|
@@ -16,6 +16,14 @@ tags:
|
|
| 16 |
|
| 17 |

|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
## Introduction
|
| 20 |
|
| 21 |
Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
|
|
@@ -116,14 +124,6 @@ models trained on different data corpora.
|
|
| 116 |
|
| 117 |
</details>
|
| 118 |
|
| 119 |
-
## Open-Source Plan
|
| 120 |
-
|
| 121 |
-
- [x] DIM Paper
|
| 122 |
-
- [x] DIM-4.6B-T2I
|
| 123 |
-
- [x] DIM-4.6B-Edit
|
| 124 |
-
- [x] DIM-Edit Data
|
| 125 |
-
- [ ] DIM-T2I Data
|
| 126 |
-
|
| 127 |
## Dataset Usage
|
| 128 |
|
| 129 |
### DIM-T2I
|
|
@@ -158,13 +158,35 @@ tar -xvzf images.tar.gz
|
|
| 158 |
In the meantime, you will find a JSONL file named `tos_dataset_edit.jsonl` in the root directory, which records all
|
| 159 |
image editing samples. Each line in this file corresponds to a single sample containing four fields:
|
| 160 |
|
| 161 |
-
| Field | Description
|
| 162 |
-
|
| 163 |
-
| **id** | Unique identifier for each sample.
|
| 164 |
-
| **image_path** | Path to the **source** image, beginning with `image/`.
|
| 165 |
-
| **image_path_target** | Path to the **target** image, beginning with `image/`.
|
| 166 |
| **prompt** | The CoT-style instruction describing how to transform the source into the target. |
|
| 167 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 168 |
## Model Usage
|
| 169 |
|
| 170 |
### Environment Setup
|
|
@@ -185,6 +207,9 @@ mkdir checkpoints
|
|
| 185 |
|
| 186 |
Then download the models from our 🤗HF repo below, and move them to the `checkpoints` folder.
|
| 187 |
|
|
|
|
|
|
|
|
|
|
| 188 |
| Model | Task | Training Data | ImgEdit | Parameters |
|
| 189 |
|:----------------------------------------------------------------------------------|:-------------:|:--------------------------:|:-------:|:---------------:|
|
| 190 |
| [**DIM-4.6B-T2I**](https://huggingface.co/stdKonjac/DIM-4.6B-T2I) | Text-to-Image | DIM-T2I + 6.9M Public Data | – | 3.0B❄️ + 1.6B🔥 |
|
|
@@ -250,18 +275,27 @@ just a placeholder.
|
|
| 250 |
In `infer/demo_edit.py`, use the `set_designer_gpt` API with your own key to set GPT-4o as the external designer for
|
| 251 |
optimal performance.
|
| 252 |
|
| 253 |
-
```
|
| 254 |
-
|
|
|
|
| 255 |
```
|
| 256 |
|
| 257 |
-
You can also use the `
|
| 258 |
-
automatically
|
| 259 |
-
downloaded
|
| 260 |
-
to local disk.
|
| 261 |
|
| 262 |
-
```
|
| 263 |
-
|
| 264 |
-
model.set_designer_qwen(version='Qwen/Qwen2.5-VL-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 265 |
```
|
| 266 |
|
| 267 |
To generate edited images from the jsonl file, run the following script:
|
|
|
|
| 16 |
|
| 17 |

|
| 18 |
|
| 19 |
+
## 📰 News
|
| 20 |
+
|
| 21 |
+
**[2025-10-08]** We release the **DIM-Edit** dataset and the **DIM-4.6B-T2I** / **DIM-4.6B-Edit** models.
|
| 22 |
+
|
| 23 |
+
**[2025-09-26]** We upload a new version of the paper, including more results across various designers.
|
| 24 |
+
|
| 25 |
+
**[2025-09-02]** The **DIM** paper is released.
|
| 26 |
+
|
| 27 |
## Introduction
|
| 28 |
|
| 29 |
Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
|
|
|
|
| 124 |
|
| 125 |
</details>
|
| 126 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
## Dataset Usage
|
| 128 |
|
| 129 |
### DIM-T2I
|
|
|
|
| 158 |
In the meantime, you will find a JSONL file named `tos_dataset_edit.jsonl` in the root directory, which records all
|
| 159 |
image editing samples. Each line in this file corresponds to a single sample containing four fields:
|
| 160 |
|
| 161 |
+
| Field | Description |
|
| 162 |
+
|:----------------------|:----------------------------------------------------------------------------------|
|
| 163 |
+
| **id** | Unique identifier for each sample. |
|
| 164 |
+
| **image_path** | Path to the **source** image, beginning with `image/`. |
|
| 165 |
+
| **image_path_target** | Path to the **target** image, beginning with `image/`. |
|
| 166 |
| **prompt** | The CoT-style instruction describing how to transform the source into the target. |
|
| 167 |
|
| 168 |
+
We recommend using the huggingface `datasets` library to load the dataset efficiently:
|
| 169 |
+
|
| 170 |
+
```python
|
| 171 |
+
from datasets import load_dataset, Features, Value
|
| 172 |
+
|
| 173 |
+
features = Features({
|
| 174 |
+
"id": Value("string"),
|
| 175 |
+
"image_path": Value("string"),
|
| 176 |
+
"image_path_target": Value("string"),
|
| 177 |
+
"prompt": Value("string"),
|
| 178 |
+
})
|
| 179 |
+
|
| 180 |
+
ds = load_dataset(
|
| 181 |
+
"json",
|
| 182 |
+
data_files="DIM-Edit/tos_dataset_edit.jsonl",
|
| 183 |
+
features=features,
|
| 184 |
+
split="train",
|
| 185 |
+
)
|
| 186 |
+
|
| 187 |
+
print(ds[0])
|
| 188 |
+
```
|
| 189 |
+
|
| 190 |
## Model Usage
|
| 191 |
|
| 192 |
### Environment Setup
|
|
|
|
| 207 |
|
| 208 |
Then download the models from our 🤗HF repo below, and move them to the `checkpoints` folder.
|
| 209 |
|
| 210 |
+
*: To facilitate reproducibility, we release [**DIM-4.6B-Edit-Stage1**](https://huggingface.co/stdKonjac/DIM-4.6B-Edit-Stage1), which is trained solely on the **UltraEdit** dataset.
|
| 211 |
+
By fine-tuning this checkpoint on our proposed [**DIM-Edit**](https://huggingface.co/datasets/stdKonjac/DIM-Edit) dataset, you should obtain [**DIM-4.6B-Edit**](https://huggingface.co/stdKonjac/DIM-4.6B-Edit).
|
| 212 |
+
|
| 213 |
| Model | Task | Training Data | ImgEdit | Parameters |
|
| 214 |
|:----------------------------------------------------------------------------------|:-------------:|:--------------------------:|:-------:|:---------------:|
|
| 215 |
| [**DIM-4.6B-T2I**](https://huggingface.co/stdKonjac/DIM-4.6B-T2I) | Text-to-Image | DIM-T2I + 6.9M Public Data | – | 3.0B❄️ + 1.6B🔥 |
|
|
|
|
| 275 |
In `infer/demo_edit.py`, use the `set_designer_gpt` API with your own key to set GPT-4o as the external designer for
|
| 276 |
optimal performance.
|
| 277 |
|
| 278 |
+
```python
|
| 279 |
+
# GPT-4o as external designer
|
| 280 |
+
model.set_designer_gpt(api_key='')
|
| 281 |
```
|
| 282 |
|
| 283 |
+
You can also use the `set_designer_X` API to set various open-source VLMs as the external designer. The VLMs will be
|
| 284 |
+
automatically downloaded to local disk.
|
|
|
|
|
|
|
| 285 |
|
| 286 |
+
```python
|
| 287 |
+
# Qwen2.5-VL as external designer
|
| 288 |
+
model.set_designer_qwen(version='Qwen/Qwen2.5-VL-3B-Instruct')
|
| 289 |
+
model.set_designer_qwen(version='Qwen/Qwen2.5-VL-7B-Instruct')
|
| 290 |
+
|
| 291 |
+
# InternVL3.5 as external designer (recommend using transformers==4.53.0)
|
| 292 |
+
model.set_designer_internvl(version='OpenGVLab/InternVL3_5-8B-HF')
|
| 293 |
+
|
| 294 |
+
# MiMo-VL as external designer
|
| 295 |
+
model.set_designer_mimo(version='XiaomiMimo/MiMo-VL-7B-RL-2508')
|
| 296 |
+
|
| 297 |
+
# GLM-4.1V as external designer (recommend using transformers==4.53.1)
|
| 298 |
+
model.set_designer_glm(version='THUDM/GLM-4.1V-9B-Thinking')
|
| 299 |
```
|
| 300 |
|
| 301 |
To generate edited images from the jsonl file, run the following script:
|