Papers
arxiv:2503.20672

BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation

Published on Mar 26
· Submitted by Awiny on Mar 27
Authors:
,
,
,
,

Abstract

Recently, state-of-the-art text-to-image generation models, such as Flux and Ideogram 2.0, have made significant progress in sentence-level visual text rendering. In this paper, we focus on the more challenging scenarios of article-level visual text rendering and address a novel task of generating high-quality business content, including infographics and slides, based on user provided article-level descriptive prompts and ultra-dense layouts. The fundamental challenges are twofold: significantly longer context lengths and the scarcity of high-quality business content data. In contrast to most previous works that focus on a limited number of sub-regions and sentence-level prompts, ensuring precise adherence to ultra-dense layouts with tens or even hundreds of sub-regions in business content is far more challenging. We make two key technical contributions: (i) the construction of scalable, high-quality business content dataset, i.e., Infographics-650K, equipped with ultra-dense layouts and prompts by implementing a layer-wise retrieval-augmented infographic generation scheme; and (ii) a layout-guided cross attention scheme, which injects tens of region-wise prompts into a set of cropped region latent space according to the ultra-dense layouts, and refine each sub-regions flexibly during inference using a layout conditional CFG. We demonstrate the strong results of our system compared to previous SOTA systems such as Flux and SD3 on our BizEval prompt set. Additionally, we conduct thorough ablation experiments to verify the effectiveness of each component. We hope our constructed Infographics-650K and BizEval can encourage the broader community to advance the progress of business content generation.

Community

Paper submitter
edited 3 days ago

Infographic Text Rendering Method

I only tried infographics on Flux. I've noticed that when we ask it to create a 'detailed' infograph and do not add "extra text for context at the end of the prompt", I believe it's the T5 (?) which complains a lot! I usually read those mockup text, and once in a while there's a "Explain?" or a "Most Important Facts?", once I got even an "What facts?" lol

Unfortunately I did not save them, but I don't think it's hard to reproduce.

And, as an example, when I asked it to

design a detailed infograph about "Aurora Australis", whimsical style, appealing narrative, sleek, ..."then here I added a brief explanation generated by a LLM"

Flux did not really used the text provided, but the narrative improved and I did not notice any 'questioning' nor 'ramblings.'

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 4

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.20672 in a Space README.md to link it from this page.

Collections including this paper 6