PointArena: Probing Multimodal Grounding Through Language-Guided Pointing Paper • 2505.09990 • Published 10 days ago • 11
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published 10 days ago • 15
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published 10 days ago • 9
X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real Paper • 2505.07096 • Published 13 days ago • 3