Kalaido-qwen-image-lora β Reinforcement Learning Enhanced Qwen-Image
π Introduction
Kalaido-qwen-image-lora is a LoRA finetune of original Qwen-Image model, fine-tuned using state-of-the art RL techniques to improve to build upon the strong foundation of Qwen-Image.
The resulting model demonstrates:
- Sharper and more readable text rendering.
- Better aesthetic composition and lighting balance.
- Improved semantic alignment between textual prompts and visual generations.
βοΈ Example Usage
Install the latest version of diffusers
pip install git+https://github.com/huggingface/diffusers
Note: The Lora works best when only partial denoising is done with it. Hence it is used for only 10 steps.
import torch
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel
model_id = "Qwen/Qwen-Image" # Replace with your HF model ID
lora_id = 'FractalAIResearch/Kalaido-qwen-image-lora'
# Load the base model
transformer = QwenImageTransformer2DModel.from_pretrained(
model_id, torch_dtype=torch.bfloat16, subfolder="transformer"
)
# Load the pipeline
pipe = QwenImagePipeline.from_pretrained(
model_id, torch_dtype=torch.bfloat16, transformer=transformer
)
pipe.load_lora_weights(lora_ckpt_path, weight_name = 'pytorch_lora_weights.safetensors',adapter_name='aes')
pipe.to("cuda")
pipe.enable_vae_tiling()
pipe.enable_vae_slicing()
prompt = "A blackboard that says 'AI research FRACTAL'"
negative_prompt = " " # using an empty string if you do not have specific concept to remove
def callback_on_step_end(self, i, t, callback_kwargs):
if i == 10:
self.disable_lora()
return {}
image = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=1024,
height=1024,
num_inference_steps=50,
true_cfg_scale=4.0,
generator=torch.Generator(device='cuda').manual_seed(42),
callback_on_step_end = callback_on_step_end
).images[0]
image.save("output.png")
π§ͺ Evaluation
The evaluation of Kalaido-qwen-image-lora was performed across multiple benchmarks to measure text rendering, visual aesthetics, and alignment to human preferences. For all evaluations the lora model is used for only 10 steps:
- OneIg: It is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment, text rendering precision, reasoning-generated content, stylization, and diversity. Evaluation was conducted only on the English subset of the One-IG text benchmark.
| Model | Alignment | Text |
|---|---|---|
| FLUX.1 [Dev] | 0.786 | 0.523 |
| HiDream-I1-Full | 0.829 | 0.707 |
| Seedream 3.0 | 0.818 | 0.865 |
| GPT Image 1 [High] | 0.851 | 0.857 |
| Qwen-Image | 0.882 | 0.891 |
| Kalaido-qwen-image-lora | 0.889 | 0.979 |
- Long-Text bench: LongText-Bench, proposed in X-Omni, focuses on evaluating the performance on rendering longer texts in both English and Chinese. We evalute our model on only the English subsection of this benchmark.
| Model | LongText-Bench-EN |
|---|---|
| HiDream-I1-Full (Cai et al., 2025) | 0.543 |
| FLUX.1 [Dev] (BlackForest, 2024) | 0.607 |
| Seedream 3.0 (Gao et al., 2025) | 0.896 |
| GPT Image 1 [High] (OpenAI, 2025) | 0.956 |
| Qwen-Image | 0.935 |
| Kalaido-qwen-image-lora | 0.939 |
- Aesthetic score: For aesthetic score, 1,000 prompts were randomly sampled from the Hpsv3 test set.
| Model | Aesthetic Score |
|---|---|
| Qwen-Image | 6.62 |
| FLUX.1 [Dev] | 6.71 |
| HiDream-I1-Full | 6.70 |
| GPT-Image-1 | 6.79 |
| Kalaido-qwen-image-lora | 6.88 |
Qualitative comparisions.
The figure below compares image generations from the baseline Qwen-Image model (left) and our Kalaido-qwen-image-lora model (right). Kalaido-qwen-image-lora consistently produces outputs with improved aesthetics, and better semantic alignment to the given prompts.
The figure below compares image generations from the baseline Qwen-Image model (left) and our Kalaido-qwen-image-lora model (right). Kalaido-qwen-image-lora consistently produces outputs with improved aesthetics, and better text rendering.
License
Kalaido-qwen-image-lora is licensed under the Apache License 2.0
- Downloads last month
- 45
Model tree for FractalAIResearch/Kalaido-qwen-image-lora
Base model
Qwen/Qwen-Image