Configuration Parsing
Warning:
In adapter_config.json: "peft.base_model_name_or_path" must be a string
Configuration Parsing
Warning:
In adapter_config.json: "peft.task_type" must be a string
FLUX.1-dev LoRA Fine-tuned with Flow-GRPO
This LoRA (Low-Rank Adaptation) model is a fine-tuned version of FLUX.1-dev using Flow-GRPO (Flow-based Group Relative Policy Optimization), a novel reinforcement learning technique for flow matching models.
Model Description
This model was trained using the Flow-GRPO methodology described in the paper "Flow-GRPO: Training Flow Matching Models via Online RL". Flow-GRPO integrates online reinforcement learning into flow matching models by:
- ODE-to-SDE conversion: Transforms deterministic flow matching into stochastic sampling for RL exploration
- Denoising reduction: Uses fewer denoising steps during training while maintaining full quality at inference
- Human preference optimization: Trained with PickScore reward to align with human preferences
Training Details
Core Configuration
- Base Model: FLUX.1-dev
- Training Method: Flow-GRPO with PickScore reward
- Resolution: 512×512
- Mixed Precision: bfloat16
- Seed: 42
LoRA Configuration
- LoRA Enabled: True
- Rank: Not specified in config (typically 32-64)
- Target Modules: Transformer layers
Training Hyperparameters
- Learning Rate: 5e-5
- Batch Size: 1 (with gradient accumulation: 32 steps)
- Optimizer: 8-bit AdamW
- β₁: 0.9
- β₂: 0.999
- Weight Decay: 1e-4
- Epsilon: 1e-8
- Gradient Clipping: Max norm 1.0
- Max Epochs: 100,000
- Save Frequency: Every 100 steps
Flow-GRPO Specific
- Reward Function: PickScore (human preference)
- Beta (KL penalty): 0.001
- Clip Range: 0.2
- Advantage Clipping: Max 5.0
- Timestep Fraction: 0.2
- Guidance Scale: 3.5
Sampling Configuration
- Training Steps: 2 (denoising reduction)
- Evaluation Steps: 4
- Images per Prompt: 4
- Batches per Epoch: 4
Usage
With Diffusers
import torch
from diffusers import FluxPipeline
# Load the base model
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load the LoRA weights
pipe.load_lora_weights("ighoshsubho/lora-grpo-flux-dev")
# Generate an image
prompt = "A serene landscape with mountains and a lake at sunset"
image = pipe(
prompt,
height=512,
width=512,
guidance_scale=3.5,
num_inference_steps=20,
max_sequence_length=256,
).images[0]
image.save("generated_image.png")
Adjusting LoRA Strength
# You can adjust the LoRA influence
pipe.set_adapters(["default"], adapter_weights=[0.8]) # 80% LoRA influence
Training Data & Objectives
- Dataset: Custom PickScore dataset for human preference alignment
- Prompt Function: General OCR prompts
- Optimization Target: Maximizing PickScore while maintaining image quality
- KL Regularization: Prevents reward hacking and maintains model stability
Performance Improvements
This model demonstrates improvements in:
- Human preference alignment through PickScore optimization
- Text rendering quality via OCR-focused training
- Compositional understanding enhanced by Flow-GRPO's exploration mechanism
- Stable training with minimal reward hacking due to KL regularization
Technical Notes
- Uses denoising reduction during training (2 steps) for efficiency
- Maintains full quality with standard inference steps (20-50)
- Trained with mixed precision (bfloat16) for memory efficiency
- 8-bit AdamW optimizer reduces memory footprint
- Gradient accumulation (32 steps) enables effective large batch training
Limitations
- Optimized for 512×512 resolution
- Focused on PickScore preferences (may not generalize to all aesthetic preferences)
- LoRA adaptation may have reduced capacity compared to full fine-tuning
Citation
If you use this model, please cite the Flow-GRPO paper:
@article{liu2025flow,
title={Flow-GRPO: Training Flow Matching Models via Online RL},
author={Liu, Jie and Liu, Gongye and Liang, Jiajun and Li, Yangguang and Liu, Jiaheng and Wang, Xintao and Wan, Pengfei and Zhang, Di and Ouyang, Wanli},
journal={arXiv preprint arXiv:2505.05470},
year={2025}
}
License
This model is released under the Apache 2.0 License, following the base FLUX.1-dev model license.
- Downloads last month
- 14
Model tree for ighoshsubho/lora-grpo-flux-dev
Base model
black-forest-labs/FLUX.1-dev