frankzeng commited on
Commit
f77b8ed
·
verified ·
1 Parent(s): 93ddc64

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -35,3 +35,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  processor/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  text_encoder/tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  processor/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
  text_encoder/tokenizer.json filter=lfs diff=lfs merge=lfs -text
38
+ assets/arch.png filter=lfs diff=lfs merge=lfs -text
39
+ assets/eval_res_en.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/image_edit_demo.gif filter=lfs diff=lfs merge=lfs -text
41
+ assets/results_show.png filter=lfs diff=lfs merge=lfs -text
42
+ examples/0000.jpg filter=lfs diff=lfs merge=lfs -text
43
+ examples/0001.png filter=lfs diff=lfs merge=lfs -text
44
+ examples/0002.jpg filter=lfs diff=lfs merge=lfs -text
45
+ examples/0003.png filter=lfs diff=lfs merge=lfs -text
46
+ examples/0004.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,101 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ pipeline_tag: image-to-image
7
+ library_name: diffusers
8
+ ---
9
+
10
+ ## 🔥🔥🔥 News!!
11
+ * Sep 08, 2025: 👋 We release [step1x-edit-v1p2-preview](https://huggingface.co/stepfun-ai/Step1X-Edit-v1p2-preview), a new version of Step1X-Edit with reasoning edit ability and better performance (report to be released soon), featuring:
12
+ - Native Reasoning Edit Model: Combines instruction reasoning with reflective correction to handle complex edits more accurately. Performance on KRIS-Bench:
13
+ | Models | Factual Knowledge ⬆️ | Conceptual Knowledge ⬆️ | Procedural Knowledge ⬆️ | Overall ⬆️ |
14
+ |:------------:|:------------:|:------------:| :------------:|:------------:|
15
+ | Step1X-Edit v1.1 | 53.05 | 54.34 | 44.66 | 51.59 |
16
+ | Step1x-edit-v1p2-preview | 60.49 | 58.81 | 41.77 | 52.51 |
17
+ | Step1x-edit-v1p2-preview (thinking) | 62.24 | 62.25 | 44.43 | 55.21|
18
+ | Step1x-edit-v1p2-preview (thinking + reflection) | 62.94 | 61.82 | 44.08 | 55.64 |
19
+ - Improved image editing quality and better instruction-following performance. Performance on GEdit-Bench:
20
+ | Models | G_SC ⬆️ | G_PQ ⬆️ | G_O ⬆️ | Q_SC ⬆️ | Q_PQ ⬆️ | Q_O ⬆️ |
21
+ |:------------:|:------------:|:------------:| :------------:|:------------:| :------------:|:------------:|
22
+ | Step1X-Edit (v1.0) | 7.13 | 7.00 | 6.44 | 7.39 | 7.28 | 7.07 |
23
+ | Step1X-Edit (v1.1) | 7.66 | 7.35 | 6.97 | 7.65 | 7.41 | 7.35 |
24
+ | Step1x-edit-v1p2-preview | 8.14 | 7.55 | 7.42 | 7.90 | 7.34 | 7.40 |
25
+ <!-- ## Image Edit Demos -->
26
+
27
+ <div align="center">
28
+ <img width="720" alt="demo" src="assets/image_edit_demo.gif">
29
+ <p><b>Step1X-Edit:</b> a unified image editing model performs impressively on various genuine user instructions. </p>
30
+ </div>
31
+
32
+
33
+ ## 🧩 Model Usages
34
+ Install the `diffusers` package from the following command:
35
+ ```bash
36
+ git clone -b dev/MergeV1-2 https://github.com/Peyton-Chen/diffusers.git
37
+ cd diffusers
38
+ pip install -e .
39
+ ```
40
+
41
+ Here is an example for using the `Step1XEditPipelineV1P2` class to edit images with thinking and reflection:
42
+
43
+ ```python
44
+ import torch
45
+ from diffusers import Step1XEditPipelineV1P2
46
+ from diffusers.utils import load_image
47
+ pipe = Step1XEditPipelineV1P2.from_pretrained("stepfun-ai/Step1X-Edit-v1p2-preview", torch_dtype=torch.bfloat16)
48
+ pipe.to("cuda")
49
+ print("=== processing image ===")
50
+ image = load_image("examples/0000.jpg").convert("RGB")
51
+ prompt = "add a ruby ​​pendant on the girl's neck."
52
+ enable_thinking_mode=True
53
+ enable_reflection_mode=True
54
+ pipe_output = pipe(
55
+ image=image,
56
+ prompt=prompt,
57
+ num_inference_steps=28,
58
+ true_cfg_scale=4,
59
+ generator=torch.Generator().manual_seed(42),
60
+ enable_thinking_mode=enable_thinking_mode,
61
+ enable_reflection_mode=enable_reflection_mode,
62
+ )
63
+ if enable_thinking_mode:
64
+ print("Reformat Prompt:", pipe_output.reformat_prompt)
65
+ for image_idx in range(len(pipe_output.images)):
66
+ pipe_output.images[image_idx].save(f"0001-{image_idx}.jpg", lossless=True)
67
+ if enable_reflection_mode:
68
+ print(pipe_output.think_info[image_idx])
69
+ ```
70
+
71
+ The results will look like:
72
+ <div align="center">
73
+ <img width="1080" alt="results" src="assets/results_show.png">
74
+ </div>
75
+
76
+
77
+
78
+ ## 📑 Model introduction
79
+ <div align="center">
80
+ <img width="720" alt="demo" src="assets/arch.png">
81
+ </div>
82
+
83
+ Framework of Step1X-Edit. Step1X-Edit leverages the image understanding capabilities
84
+ of MLLMs to parse editing instructions and generate editing tokens, which are then decoded into
85
+ images using a DiT-based network.More details please refer to our [technical report](https://arxiv.org/abs/2504.17761).
86
+
87
+
88
+ We release [GEdit-Bench](https://huggingface.co/datasets/stepfun-ai/GEdit-Bench) as a new benchmark, grounded in real-world usages is developed to support more authentic and comprehensive evaluation. This benchmark, which is carefully curated to reflect actual user editing needs and a wide range of editing scenarios, enables more authentic and comprehensive evaluations of image editing models. Part results of the benchmark are shown below:
89
+ <div align="center">
90
+ <img width="1080" alt="results" src="assets/eval_res_en.png">
91
+ </div>
92
+
93
+ ## Citation
94
+ ```
95
+ @article{liu2025step1x-edit,
96
+ title={Step1X-Edit: A Practical Framework for General Image Editing},
97
+ author={Shiyu Liu and Yucheng Han and Peng Xing and Fukun Yin and Rui Wang and Wei Cheng and Jiaqi Liao and Yingming Wang and Honghao Fu and Chunrui Han and Guopeng Li and Yuang Peng and Quan Sun and Jingwei Wu and Yan Cai and Zheng Ge and Ranchen Ming and Lei Xia and Xianfang Zeng and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Gang Yu and Daxin Jiang},
98
+ journal={arXiv preprint arXiv:2504.17761},
99
+ year={2025}
100
+ }
101
+ ```
assets/arch.png ADDED

Git LFS Details

  • SHA256: e350dd53520acd47e7e615cc624aa8a3268dd8a3f0ba404716b75a6cf5cda16b
  • Pointer size: 131 Bytes
  • Size of remote file: 116 kB
assets/eval_res_en.png ADDED

Git LFS Details

  • SHA256: 12fb4b5fe83d00114806da6ec3bd1df77e293ec3fe5f1e951b466a7812dcec2d
  • Pointer size: 131 Bytes
  • Size of remote file: 435 kB
assets/image_edit_demo.gif ADDED

Git LFS Details

  • SHA256: a513ccd6459d1838748d27af20458da2476cfbd012fc56837123339dafd423e2
  • Pointer size: 133 Bytes
  • Size of remote file: 12.9 MB
assets/logo.png ADDED
assets/results_show.png ADDED

Git LFS Details

  • SHA256: 8ac57118e59a67a60572ad9fce704bc81e2c3378bba47febed0936582e4eb76a
  • Pointer size: 132 Bytes
  • Size of remote file: 2.48 MB
examples/0000.jpg ADDED

Git LFS Details

  • SHA256: 8a20ba65fb5444d96ca1f10d15b3a75eca00feb1242fe1c83770b6b4ff7f7cbb
  • Pointer size: 132 Bytes
  • Size of remote file: 1.38 MB
examples/0001.png ADDED

Git LFS Details

  • SHA256: 5296f9a405c1c24d4b03deb24257d7a6ea28591262cec18264ccf012d4934d0f
  • Pointer size: 132 Bytes
  • Size of remote file: 3.85 MB
examples/0002.jpg ADDED

Git LFS Details

  • SHA256: 9b6a31f9e9f6ad060e4b537bc28b45e583f6e3f6aa08a867f59c733aa00fabf0
  • Pointer size: 131 Bytes
  • Size of remote file: 442 kB
examples/0003.png ADDED

Git LFS Details

  • SHA256: 902ad4831a62d0eb30d3b128ea23da8b03419980a6658d6388f8bb48c26d1103
  • Pointer size: 131 Bytes
  • Size of remote file: 427 kB
examples/0004.jpg ADDED

Git LFS Details

  • SHA256: c2434d3216d7d626e6500bca681c6ea65fb41ac52b4346ad33a225b13ed3ac3a
  • Pointer size: 131 Bytes
  • Size of remote file: 701 kB