===================================================================================

本模型为 https://huggingface.co/tencent/SRPO 模型的精调和 8bit/4bit (fp8_e4m3fn/Q8_0/Q4_1) 量化版本，主要提升出图的清晰度和模型的兼容性(第一张图片中的 SRPO-fp8 量化生成的图片，显得特别模糊，主要是由于采用 ComfyUI 模型加载并直接量化的方式造成，并非模型 fp8 精度下的实际表现，实际表现请参阅第二张对比图，为避免使用者误解，特提供第二张对比图，模型在不同精度下的表现是正常的)。

This model is the refine and quantized version of the model: https://huggingface.co/tencent/SRPO, it improve the clarity of the generated images and the compatibility of the models. (In below image, the SRPO-fp8 means load and quantized directly by ComfyUI diffusion model loader nodes)

For FP16 version:

Pls refer to: https://civitai.com/models/1961797 or my: https://www.modelscope.cn/models/wikeeyang/SRPO-Refine-Quantized

Compare SRPO offical and R&Q v1.0 in the same quantized accuracy:

Example workflow: Please refer to workflow.png

License Agreement

Please fall under SRPO license refer license.txt file and refer to the FLUX.1 [dev] Non-Commercial License.

Also: https://civitai.com/models/1953067

以下部分引用自原模型说明内容：

===================================================================================

Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference

Xiangwei Shen^1,2*, Zhimin Li^1*, Zhantao Yang¹, Shiyi Zhang³, Yingfang Zhang¹, Donghao Li¹,
Chunyu Wang¹, Qinglin Lu¹, Yansong Tang^3,✝

¹Hunyuan, Tencent
²School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen
³Shenzhen International Graduate School, Tsinghua University
^*Equal contribution ^✝Corresponding author

Abstract

Recent studies have demonstrated the effectiveness of directly aligning diffusion models with human preferences using differentiable reward. However, they exhibit two primary challenges: (1) they rely on multistep denoising with gradient computation for reward scoring, which is computationally expensive, thus restricting optimization to only a few diffusion steps; (2) they often need continuous offline adaptation of reward models in order to achieve desired aesthetic quality, such as photorealism or precise lighting effects. To address the limitation of multistep denoising, we propose Direct-Align, a method that predefines a noise prior to effectively recover original images from any time steps via interpolation, leveraging the equation that diffusion states are interpolations between noise and target images, which effectively avoids over-optimization in late timesteps. Furthermore, we introduce Semantic Relative Preference Optimization (SRPO), in which rewards are formulated as text-conditioned signals. This approach enables online adjustment of rewards in response to positive and negative prompt augmentation, thereby reducing the reliance on offline reward fine-tuning. By fine-tuning the FLUX.1.dev model with optimized denoising and online reward adjustment, we improve its human-evaluated realism and aesthetic quality by over 3x.

Checkpoints

The diffusion_pytorch_model.safetensors is online version of SRPO based on FLUX.1 Dev, trained on HPD dataset with HPSv2

License

SRPO is licensed under the License Terms of SRPO. See ./License.txt for more details.

Citation

If you use SRPO for your research, please cite our paper:

@misc{shen2025directlyaligningdiffusiontrajectory,
      title={Directly Aligning the Full Diffusion Trajectory with Fine-Grained Human Preference}, 
      author={Xiangwei Shen and Zhimin Li and Zhantao Yang and Shiyi Zhang and Yingfang Zhang and Donghao Li and Chunyu Wang and Qinglin Lu and Yansong Tang},
      year={2025},
      eprint={2509.06942},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.06942}, 
}

Downloads last month: 17,011

GGUF

Model size

11.9B params

Architecture

undefined

Hardware compatibility

4-bit

8-bit

Model tree for wikeeyang/SRPO-Refine-Quantized-v1.0

Base model

tencent/SRPO

Quantized

(4)

this model