Persona Dynamics: Unveiling the Impact of Personality Traits on Agents in Text-Based Games Paper • 2504.06868 • Published Apr 9 • 1
Don't Look Only Once: Towards Multimodal Interactive Reasoning with Selective Visual Revisitation Paper • 2505.18842 • Published May 24 • 37
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17 • 9
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research Paper • 2505.11855 • Published May 17 • 9
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents Paper • 2306.10376 • Published Jun 17, 2023
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! Paper • 2410.01023 • Published Oct 1, 2024 • 2
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
VisEscape: A Benchmark for Evaluating Exploration-driven Decision-making in Virtual Escape Rooms Paper • 2503.14427 • Published Mar 18 • 19
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding Paper • 2411.19527 • Published Nov 29, 2024 • 10
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you! Paper • 2410.01023 • Published Oct 1, 2024 • 2
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics Paper • 2406.14703 • Published Jun 20, 2024 • 2
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation Paper • 2410.13232 • Published Oct 17, 2024 • 45