Story2Board: A Training-Free Approach for Expressive Storyboard Generation
Abstract
Story2Board generates expressive storyboards from natural language using a consistency framework that enhances coherence and diversity without fine-tuning.
We present Story2Board, a training-free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine-tuning, enabling state-of-the-art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off-the-shelf language model to convert free-form stories into grounded panel-level prompts. To evaluate, we propose the Rich Storyboard Benchmark, a suite of open-domain narratives designed to assess layout diversity and background-grounded storytelling, in addition to consistency. We also introduce a new Scene Diversity metric that quantifies spatial and pose variation across storyboards. Our qualitative and quantitative results, as well as a user study, show that Story2Board produces more dynamic, coherent, and narratively engaging storyboards than existing baselines.
Community
Story2Board: A Training‑Free Approach for Expressive Storyboard Generation
We present Story2Board, a training‑free framework for expressive storyboard generation from natural language. Existing methods narrowly focus on subject identity, overlooking key aspects of visual storytelling such as spatial composition, background evolution, and narrative pacing. To address this, we introduce a lightweight consistency framework composed of two components: Latent Panel Anchoring, which preserves a shared character reference across panels, and Reciprocal Attention Value Mixing, which softly blends visual features between token pairs with strong reciprocal attention. Together, these mechanisms enhance coherence without architectural changes or fine‑tuning, enabling state‑of‑the‑art diffusion models to generate visually diverse yet consistent storyboards. To structure generation, we use an off‑the‑shelf language model to convert free‑form stories into grounded panel‑level prompts. To evaluate, we propose the Rich Storyboard Benchmark and a Scene Diversity metric that quantify layout variation and background‑grounded storytelling, in addition to consistency.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- StorySync: Training-Free Subject Consistency in Text-to-Image Generation via Region Harmonization (2025)
- Local Prompt Adaptation for Style-Consistent Multi-Object Generation in Diffusion Models (2025)
- Audit & Repair: An Agentic Framework for Consistent Story Visualization in Text-to-Image Diffusion Models (2025)
- FairyGen: Storied Cartoon Video from a Single Child-Drawn Character (2025)
- Cut2Next: Generating Next Shot via In-Context Tuning (2025)
- Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation (2025)
- A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper