Can Vision Language Models Infer Human Gaze Direction? A Controlled Study Paper • 2506.05412 • Published 22 days ago • 4
4D-LRM: Large Space-Time Reconstruction Model From and To Any View at Any Time Paper • 2506.18890 • Published 3 days ago • 4
VEGGIE: Instructional Editing and Reasoning of Video Concepts with Grounded Generation Paper • 2503.14350 • Published Mar 18
Vision-Language Models Are Not Pragmatically Competent in Referring Expression Generation Paper • 2504.16060 • Published Apr 22
DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences Paper • 2406.03008 • Published Jun 5, 2024
Vision-and-Language Navigation Today and Tomorrow: A Survey in the Era of Foundation Models Paper • 2407.07035 • Published Jul 9, 2024
Training Turn-by-Turn Verifiers for Dialogue Tutoring Agents: The Curious Case of LLMs as Your Coding Tutors Paper • 2502.13311 • Published Feb 18 • 1
Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions Paper • 2406.09264 • Published Jun 13, 2024 • 2
Towards Collaborative Plan Acquisition through Theory of Mind Modeling in Situated Dialogue Paper • 2305.11271 • Published May 18, 2023
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation Paper • 2402.16846 • Published Feb 26, 2024
DOROTHIE: Spoken Dialogue for Handling Unexpected Situations in Interactive Autonomous Driving Agents Paper • 2210.12511 • Published Oct 22, 2022
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models Paper • 2306.08685 • Published Jun 14, 2023 • 1
DANLI: Deliberative Agent for Following Natural Language Instructions Paper • 2210.12485 • Published Oct 22, 2022
Towards A Holistic Landscape of Situated Theory of Mind in Large Language Models Paper • 2310.19619 • Published Oct 30, 2023
CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation Paper • 2310.13165 • Published Oct 19, 2023