VLMs - a shawon Collection

shawon 's Collections

CV

VLMs

RAG

VLMs

updated May 17

PointArena: Probing Multimodal Grounding Through Language-Guided Pointing

Paper • 2505.09990 • Published May 15 • 12
Style Customization of Text-to-Vector Generation with Image Diffusion Priors

Paper • 2505.10558 • Published May 15 • 15
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis

Paper • 2505.10046 • Published May 15 • 9
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Paper • 2505.07096 • Published May 11 • 4