EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling
Abstract
A specialized reward model, EditScore, enables effective reinforcement learning for instruction-guided image editing by providing a high-fidelity reward signal.
Instruction-guided image editing has achieved remarkable progress, yet current models still face challenges with complex instructions and often require multiple samples to produce a desired result. Reinforcement Learning (RL) offers a promising solution, but its adoption in image editing has been severely hindered by the lack of a high-fidelity, efficient reward signal. In this work, we present a comprehensive methodology to overcome this barrier, centered on the development of a state-of-the-art, specialized reward model. We first introduce EditReward-Bench, a comprehensive benchmark to systematically evaluate reward models on editing quality. Building on this benchmark, we develop EditScore, a series of reward models (7B-72B) for evaluating the quality of instruction-guided image editing. Through meticulous data curation and filtering, EditScore effectively matches the performance of learning proprietary VLMs. Furthermore, coupled with an effective self-ensemble strategy tailored for the generative nature of EditScore, our largest variant even surpasses GPT-5 in the benchmark. We then demonstrate that a high-fidelity reward model is the key to unlocking online RL for image editing. Our experiments show that, while even the largest open-source VLMs fail to provide an effective learning signal, EditScore enables efficient and robust policy optimization. Applying our framework to a strong base model, OmniGen2, results in a final model that shows a substantial and consistent performance uplift. Overall, this work provides the first systematic path from benchmarking to reward modeling to RL training in image editing, showing that a high-fidelity, domain-specialized reward model is the key to unlocking the full potential of RL in this domain.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning (2025)
- Pref-GRPO: Pairwise Preference Reward-based GRPO for Stable Text-to-Image Reinforcement Learning (2025)
- X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning (2025)
- RewardDance: Reward Scaling in Visual Generation (2025)
- VideoRewardBench: Comprehensive Evaluation of Multimodal Reward Models for Video Understanding (2025)
- PromptEnhancer: A Simple Approach to Enhance Text-to-Image Models via Chain-of-Thought Prompt Rewriting (2025)
- VideoScore2: Think before You Score in Generative Video Evaluation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Quick question for the authors, in the Table 2 (Benchmark results on EditReward-Bench
). The score for EditScore models is given both for Base and Avg@4, but for the reference models (GPT5 ...) it's not specified if it's Base or Avg@4. Which one is it ?
Hi, thank you for interest! all other models are reported with base setting
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper