Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling Paper • 2504.13169 • Published Apr 17 • 39
Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint Paper • 2505.23759 • Published May 29 • 6
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published Apr 22 • 63
Atlas: Multi-Scale Attention Improves Long Context Image Modeling Paper • 2503.12355 • Published Mar 16 • 12
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence Paper • 2305.14334 • Published May 23, 2023 • 1
Readout Guidance: Learning Control from Diffusion Features Paper • 2312.02150 • Published Dec 4, 2023 • 3
Bootstrapping Objectness from Videos by Relaxed Common Fate and Visual Grouping Paper • 2304.08025 • Published Apr 17, 2023 • 2
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models Paper • 2305.13655 • Published May 23, 2023 • 7