Vision-Language Models Struggle to Align Entities across Modalities Paper • 2503.03854 • Published Mar 5