ziyun.zeng commited on
Commit
b43620c
·
1 Parent(s): 2008b8d

update README

Browse files
Files changed (1) hide show
  1. README.md +56 -22
README.md CHANGED
@@ -16,6 +16,14 @@ tags:
16
 
17
  ![DIM-Edit](assets/dim_edit.png)
18
 
 
 
 
 
 
 
 
 
19
  ## Introduction
20
 
21
  Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
@@ -116,14 +124,6 @@ models trained on different data corpora.
116
 
117
  </details>
118
 
119
- ## Open-Source Plan
120
-
121
- - [x] DIM Paper
122
- - [x] DIM-4.6B-T2I
123
- - [x] DIM-4.6B-Edit
124
- - [x] DIM-Edit Data
125
- - [ ] DIM-T2I Data
126
-
127
  ## Dataset Usage
128
 
129
  ### DIM-T2I
@@ -158,13 +158,35 @@ tar -xvzf images.tar.gz
158
  In the meantime, you will find a JSONL file named `tos_dataset_edit.jsonl` in the root directory, which records all
159
  image editing samples. Each line in this file corresponds to a single sample containing four fields:
160
 
161
- | Field | Description |
162
- |:----------------------|:------------------------------------------------------------------------------------------|
163
- | **id** | Unique identifier for each sample. |
164
- | **image_path** | Path to the **source** image, beginning with `image/`. |
165
- | **image_path_target** | Path to the **target** image, beginning with `image/`. |
166
  | **prompt** | The CoT-style instruction describing how to transform the source into the target. |
167
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
168
  ## Model Usage
169
 
170
  ### Environment Setup
@@ -185,6 +207,9 @@ mkdir checkpoints
185
 
186
  Then download the models from our 🤗HF repo below, and move them to the `checkpoints` folder.
187
 
 
 
 
188
  | Model | Task | Training Data | ImgEdit | Parameters |
189
  |:----------------------------------------------------------------------------------|:-------------:|:--------------------------:|:-------:|:---------------:|
190
  | [**DIM-4.6B-T2I**](https://huggingface.co/stdKonjac/DIM-4.6B-T2I) | Text-to-Image | DIM-T2I + 6.9M Public Data | – | 3.0B❄️ + 1.6B🔥 |
@@ -250,18 +275,27 @@ just a placeholder.
250
  In `infer/demo_edit.py`, use the `set_designer_gpt` API with your own key to set GPT-4o as the external designer for
251
  optimal performance.
252
 
253
- ```
254
- model.set_designer_gpt(api_key='') # DIM-4.6B-Edit
 
255
  ```
256
 
257
- You can also use the `set_designer_qwen` API to set Qwen2.5-VL-XB as the external designer. Qwen models will be
258
- automatically
259
- downloaded
260
- to local disk.
261
 
262
- ```
263
- model.set_designer_qwen(version='Qwen/Qwen2.5-VL-3B-Instruct') # DIM-4.6B-Edit-Q3B
264
- model.set_designer_qwen(version='Qwen/Qwen2.5-VL-7B-Instruct') # DIM-4.6B-Edit-Q7B
 
 
 
 
 
 
 
 
 
 
265
  ```
266
 
267
  To generate edited images from the jsonl file, run the following script:
 
16
 
17
  ![DIM-Edit](assets/dim_edit.png)
18
 
19
+ ## 📰 News
20
+
21
+ **[2025-10-08]** We release the **DIM-Edit** dataset and the **DIM-4.6B-T2I** / **DIM-4.6B-Edit** models.
22
+
23
+ **[2025-09-26]** We upload a new version of the paper, including more results across various designers.
24
+
25
+ **[2025-09-02]** The **DIM** paper is released.
26
+
27
  ## Introduction
28
 
29
  Unified models achieve strong results in text-to-image generation but remain weak in precise editing. This limitation
 
124
 
125
  </details>
126
 
 
 
 
 
 
 
 
 
127
  ## Dataset Usage
128
 
129
  ### DIM-T2I
 
158
  In the meantime, you will find a JSONL file named `tos_dataset_edit.jsonl` in the root directory, which records all
159
  image editing samples. Each line in this file corresponds to a single sample containing four fields:
160
 
161
+ | Field | Description |
162
+ |:----------------------|:----------------------------------------------------------------------------------|
163
+ | **id** | Unique identifier for each sample. |
164
+ | **image_path** | Path to the **source** image, beginning with `image/`. |
165
+ | **image_path_target** | Path to the **target** image, beginning with `image/`. |
166
  | **prompt** | The CoT-style instruction describing how to transform the source into the target. |
167
 
168
+ We recommend using the huggingface `datasets` library to load the dataset efficiently:
169
+
170
+ ```python
171
+ from datasets import load_dataset, Features, Value
172
+
173
+ features = Features({
174
+ "id": Value("string"),
175
+ "image_path": Value("string"),
176
+ "image_path_target": Value("string"),
177
+ "prompt": Value("string"),
178
+ })
179
+
180
+ ds = load_dataset(
181
+ "json",
182
+ data_files="DIM-Edit/tos_dataset_edit.jsonl",
183
+ features=features,
184
+ split="train",
185
+ )
186
+
187
+ print(ds[0])
188
+ ```
189
+
190
  ## Model Usage
191
 
192
  ### Environment Setup
 
207
 
208
  Then download the models from our 🤗HF repo below, and move them to the `checkpoints` folder.
209
 
210
+ *: To facilitate reproducibility, we release [**DIM-4.6B-Edit-Stage1**](https://huggingface.co/stdKonjac/DIM-4.6B-Edit-Stage1), which is trained solely on the **UltraEdit** dataset.
211
+ By fine-tuning this checkpoint on our proposed [**DIM-Edit**](https://huggingface.co/datasets/stdKonjac/DIM-Edit) dataset, you should obtain [**DIM-4.6B-Edit**](https://huggingface.co/stdKonjac/DIM-4.6B-Edit).
212
+
213
  | Model | Task | Training Data | ImgEdit | Parameters |
214
  |:----------------------------------------------------------------------------------|:-------------:|:--------------------------:|:-------:|:---------------:|
215
  | [**DIM-4.6B-T2I**](https://huggingface.co/stdKonjac/DIM-4.6B-T2I) | Text-to-Image | DIM-T2I + 6.9M Public Data | – | 3.0B❄️ + 1.6B🔥 |
 
275
  In `infer/demo_edit.py`, use the `set_designer_gpt` API with your own key to set GPT-4o as the external designer for
276
  optimal performance.
277
 
278
+ ```python
279
+ # GPT-4o as external designer
280
+ model.set_designer_gpt(api_key='')
281
  ```
282
 
283
+ You can also use the `set_designer_X` API to set various open-source VLMs as the external designer. The VLMs will be
284
+ automatically downloaded to local disk.
 
 
285
 
286
+ ```python
287
+ # Qwen2.5-VL as external designer
288
+ model.set_designer_qwen(version='Qwen/Qwen2.5-VL-3B-Instruct')
289
+ model.set_designer_qwen(version='Qwen/Qwen2.5-VL-7B-Instruct')
290
+
291
+ # InternVL3.5 as external designer (recommend using transformers==4.53.0)
292
+ model.set_designer_internvl(version='OpenGVLab/InternVL3_5-8B-HF')
293
+
294
+ # MiMo-VL as external designer
295
+ model.set_designer_mimo(version='XiaomiMimo/MiMo-VL-7B-RL-2508')
296
+
297
+ # GLM-4.1V as external designer (recommend using transformers==4.53.1)
298
+ model.set_designer_glm(version='THUDM/GLM-4.1V-9B-Thinking')
299
  ```
300
 
301
  To generate edited images from the jsonl file, run the following script: