BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Abstract
Recently, state-of-the-art text-to-image generation models, such as Flux and Ideogram 2.0, have made significant progress in sentence-level visual text rendering. In this paper, we focus on the more challenging scenarios of article-level visual text rendering and address a novel task of generating high-quality business content, including infographics and slides, based on user provided article-level descriptive prompts and ultra-dense layouts. The fundamental challenges are twofold: significantly longer context lengths and the scarcity of high-quality business content data. In contrast to most previous works that focus on a limited number of sub-regions and sentence-level prompts, ensuring precise adherence to ultra-dense layouts with tens or even hundreds of sub-regions in business content is far more challenging. We make two key technical contributions: (i) the construction of scalable, high-quality business content dataset, i.e., Infographics-650K, equipped with ultra-dense layouts and prompts by implementing a layer-wise retrieval-augmented infographic generation scheme; and (ii) a layout-guided cross attention scheme, which injects tens of region-wise prompts into a set of cropped region latent space according to the ultra-dense layouts, and refine each sub-regions flexibly during inference using a layout conditional CFG. We demonstrate the strong results of our system compared to previous SOTA systems such as Flux and SD3 on our BizEval prompt set. Additionally, we conduct thorough ablation experiments to verify the effectiveness of each component. We hope our constructed Infographics-650K and BizEval can encourage the broader community to advance the progress of business content generation.
Community
Infographic Text Rendering Method
I only tried infographics on Flux. I've noticed that when we ask it to create a 'detailed' infograph and do not add "extra text for context at the end of the prompt", I believe it's the T5 (?) which complains a lot! I usually read those mockup text, and once in a while there's a "Explain?" or a "Most Important Facts?", once I got even an "What facts?" lol
Unfortunately I did not save them, but I don't think it's hard to reproduce.
And, as an example, when I asked it to
design a detailed infograph about "Aurora Australis", whimsical style, appealing narrative, sleek, ..."then here I added a brief explanation generated by a LLM"
Flux did not really used the text provided, but the narrative improved and I did not notice any 'questioning' nor 'ramblings.'
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LAYOUTDREAMER: Physics-guided Layout for Text-to-3D Compositional Scene Generation (2025)
- TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation (2025)
- UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing (2025)
- DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models (2025)
- Multitwine: Multi-Object Compositing with Text and Layout Control (2025)
- DICE: Distilling Classifier-Free Guidance into Text Embeddings (2025)
- ToLo: A Two-Stage, Training-Free Layout-To-Image Generation Framework For High-Overlap Layouts (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 4
Spaces citing this paper 0
No Space linking this paper