Multimodal Knowledge Alignment with Reinforcement Learning Paper • 2205.12630 • Published May 25, 2022
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models Paper • 2404.02575 • Published Apr 3, 2024 • 51
Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding Paper • 2406.18925 • Published Jun 27, 2024 • 1
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! Paper • 2410.01023 • Published Oct 1, 2024 • 2
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2
Teaching Metric Distance to Autoregressive Multimodal Foundational Models Paper • 2503.02379 • Published Mar 4 • 4
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
Explain with Visual Keypoints Like a Real Mentor! A Benchmark for Multimodal Solution Explanation Paper • 2504.03197 • Published Apr 4 • 1
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published 21 days ago • 36
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published 21 days ago • 36
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild Paper • 2502.14892 • Published Feb 17 • 6
EgoSpeak: Learning When to Speak for Egocentric Conversational Agents in the Wild Paper • 2502.14892 • Published Feb 17 • 6
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation Paper • 2501.09284 • Published Jan 16 • 10
CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction Paper • 2410.01273 • Published Oct 2, 2024 • 10