Kalaido-qwen-image-lora β€” Reinforcement Learning Enhanced Qwen-Image

Model License Open in πŸ€— Spaces


Kalaido-qwen-image-lora Graphic

🌟 Introduction

Kalaido-qwen-image-lora is a LoRA finetune of original Qwen-Image model, fine-tuned using state-of-the art RL techniques to improve to build upon the strong foundation of Qwen-Image.
The resulting model demonstrates:

  • Sharper and more readable text rendering.
  • Better aesthetic composition and lighting balance.
  • Improved semantic alignment between textual prompts and visual generations.

βš™οΈ Example Usage

Install the latest version of diffusers

pip install git+https://github.com/huggingface/diffusers

Note: The Lora works best when only partial denoising is done with it. Hence it is used for only 10 steps.

import torch
from diffusers import QwenImagePipeline, QwenImageTransformer2DModel

model_id = "Qwen/Qwen-Image"  # Replace with your HF model ID
lora_id = 'FractalAIResearch/Kalaido-qwen-image-lora'

# Load the base model
transformer = QwenImageTransformer2DModel.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, subfolder="transformer"
)

# Load the pipeline
pipe = QwenImagePipeline.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, transformer=transformer
)
pipe.load_lora_weights(lora_ckpt_path, weight_name = 'pytorch_lora_weights.safetensors',adapter_name='aes')

pipe.to("cuda")

pipe.enable_vae_tiling()
pipe.enable_vae_slicing()

prompt = "A blackboard that says 'AI research FRACTAL'"
negative_prompt = " " # using an empty string if you do not have specific concept to remove

def callback_on_step_end(self, i, t, callback_kwargs):
    if i == 10:
        self.disable_lora()      
    return {}

image = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    width=1024,
    height=1024,
    num_inference_steps=50,
    true_cfg_scale=4.0,
    generator=torch.Generator(device='cuda').manual_seed(42),
    callback_on_step_end = callback_on_step_end
).images[0]

image.save("output.png")

πŸ§ͺ Evaluation

The evaluation of Kalaido-qwen-image-lora was performed across multiple benchmarks to measure text rendering, visual aesthetics, and alignment to human preferences. For all evaluations the lora model is used for only 10 steps:

  • OneIg: It is a meticulously designed comprehensive benchmark framework for fine-grained evaluation of T2I models across multiple dimensions, including subject-element alignment, text rendering precision, reasoning-generated content, stylization, and diversity. Evaluation was conducted only on the English subset of the One-IG text benchmark.
Model Alignment Text
FLUX.1 [Dev] 0.786 0.523
HiDream-I1-Full 0.829 0.707
Seedream 3.0 0.818 0.865
GPT Image 1 [High] 0.851 0.857
Qwen-Image 0.882 0.891
Kalaido-qwen-image-lora 0.889 0.979
  • Long-Text bench: LongText-Bench, proposed in X-Omni, focuses on evaluating the performance on rendering longer texts in both English and Chinese. We evalute our model on only the English subsection of this benchmark.
Model LongText-Bench-EN
HiDream-I1-Full (Cai et al., 2025) 0.543
FLUX.1 [Dev] (BlackForest, 2024) 0.607
Seedream 3.0 (Gao et al., 2025) 0.896
GPT Image 1 [High] (OpenAI, 2025) 0.956
Qwen-Image 0.935
Kalaido-qwen-image-lora 0.939
  • Aesthetic score: For aesthetic score, 1,000 prompts were randomly sampled from the Hpsv3 test set.
Model Aesthetic Score
Qwen-Image 6.62
FLUX.1 [Dev] 6.71
HiDream-I1-Full 6.70
GPT-Image-1 6.79
Kalaido-qwen-image-lora 6.88

Qualitative comparisions.

The figure below compares image generations from the baseline Qwen-Image model (left) and our Kalaido-qwen-image-lora model (right). Kalaido-qwen-image-lora consistently produces outputs with improved aesthetics, and better semantic alignment to the given prompts.

Comparison Grid 1 Comparison Grid 2

The figure below compares image generations from the baseline Qwen-Image model (left) and our Kalaido-qwen-image-lora model (right). Kalaido-qwen-image-lora consistently produces outputs with improved aesthetics, and better text rendering.

Comparison Grid 1 Comparison Grid 2

License

Kalaido-qwen-image-lora is licensed under the Apache License 2.0

Downloads last month
45
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for FractalAIResearch/Kalaido-qwen-image-lora

Base model

Qwen/Qwen-Image
Finetuned
(49)
this model

Collection including FractalAIResearch/Kalaido-qwen-image-lora