LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis
This repository contains the model presented in the paper LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis.
The abstract of the paper is the following:
We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$\times$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 22.16% PNED gain, and LeX-FLUX outperforming baselines in color (+10.32%), positional (+5.60%), and font accuracy (+5.63%). The codes, models, datasets, and demo are publicly available.
Usage of LeX-FLUX:
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("X-ART/LeX-FLUX", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power
prompt = "The image features a bold, dramatic design centered around the text elements \"THE,\" \"RA,\" and \"SA4GONEARAz,\" arranged to form the title of *The Boulet Brothers Dragula Season Three*. The background is a textured, dark slate-gray surface with faint grunge patterns, adding a gritty, industrial vibe. The word \"THE\" is positioned at the top in large, jagged, blood-red letters with a glossy finish and slight drop shadows, evoking a horror-inspired aesthetic. Below it, \"RA\" appears in the middle-left section, rendered in metallic silver with a fragmented, cracked texture, while \"SA4GONEARAz\" curves dynamically to the right, its letters styled in neon-green and black gradients with angular, cyberpunk-inspired edges. The number \"4\" in \"SA4GONEARAz\" replaces an \"A,\" blending seamlessly into the stylized typography. Thin, glowing purple outlines highlight the text, contrasting against the dark backdrop. Subtle rays of violet and crimson light streak diagonally across the composition, casting faint glows around the letters. The overall layout balances asymmetry and cohesion, with sharp angles and a mix of organic and mechanical design elements, creating a visually intense yet polished aesthetic that merges gothic horror with futuristic edge."
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
output_type="pil",
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("lex_flux_demo.png")
See also:
- Downloads last month
- 32