High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning Paper • 2507.05920 • Published 2 days ago • 9
Inject Semantic Concepts into Image Tagging for Open-Set Recognition Paper • 2310.15200 • Published Oct 23, 2023 • 6
Tag2Text: Guiding Vision-Language Model via Image Tagging Paper • 2303.05657 • Published Mar 10, 2023 • 1