VLM-Guided Adaptive Negative Prompting for Creative Generation
Abstract
A method using vision-language models to enhance creative image generation by adaptively steering away from conventional concepts, improving novelty with minimal computational cost.
Creative generation is the synthesis of new, surprising, and valuable samples that reflect user intent yet cannot be envisioned in advance. This task aims to extend human imagination, enabling the discovery of visual concepts that exist in the unexplored spaces between familiar domains. While text-to-image diffusion models excel at rendering photorealistic scenes that faithfully match user prompts, they still struggle to generate genuinely novel content. Existing approaches to enhance generative creativity either rely on interpolation of image features, which restricts exploration to predefined categories, or require time-intensive procedures such as embedding optimization or model fine-tuning. We propose VLM-Guided Adaptive Negative-Prompting, a training-free, inference-time method that promotes creative image generation while preserving the validity of the generated object. Our approach utilizes a vision-language model (VLM) that analyzes intermediate outputs of the generation process and adaptively steers it away from conventional visual concepts, encouraging the emergence of novel and surprising outputs. We evaluate creativity through both novelty and validity, using statistical metrics in the CLIP embedding space. Through extensive experiments, we show consistent gains in creative novelty with negligible computational overhead. Moreover, unlike existing methods that primarily generate single objects, our approach extends to complex scenarios, such as generating coherent sets of creative objects and preserving creativity within elaborate compositional prompts. Our method integrates seamlessly into existing diffusion pipelines, offering a practical route to producing creative outputs that venture beyond the constraints of textual descriptions.
Community
T2I models excel at realism, but true creativity means generating what doesn't exist yet. How do you prompt for something you can't describe? ๐จ
We introduce VLM-Guided Adaptive Negative Prompting: inference time method that promotes creative image generation.
For more details check out our paper and project page:
๐ Paper: https://arxiv.org/abs/2510.10715
๐ Project: https://shelley-golan.github.io/VLM-Guided-Creative-Generation/
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- No Concept Left Behind: Test-Time Optimization for Compositional Text-to-Image Generation (2025)
- VMDiff: Visual Mixing Diffusion for Limitless Cross-Object Synthesis (2025)
- Few-shot multi-token DreamBooth with LoRa for style-consistent character generation (2025)
- Iterative Prompt Refinement for Safer Text-to-Image Generation (2025)
- Describe, Don't Dictate: Semantic Image Editing with Natural Language Intent (2025)
- World-To-Image: Grounding Text-to-Image Generation with Agent-Driven World Knowledge (2025)
- Single-Reference Text-to-Image Manipulation with Dual Contrastive Denoising Score (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper