Can Vision-Language Models Answer Face to Face Questions in the Real-World? Paper โข 2503.19356 โข Published 23 days ago โข 2 โข 2