A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published 5 days ago • 31
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published 5 days ago • 38
ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows Paper • 2505.19897 • Published 13 days ago • 101
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published 20 days ago • 45
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning Paper • 2504.08672 • Published Apr 11 • 55
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization Paper • 2504.10127 • Published Apr 14 • 17
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization Paper • 2503.16874 • Published Mar 21 • 44
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper • 2503.16905 • Published Mar 21 • 54
φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation Paper • 2503.13288 • Published Mar 17 • 51
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction Paper • 2503.11227 • Published Mar 14 • 24
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16 • 25
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 54
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models Paper • 2501.18119 • Published Jan 30 • 25
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents Paper • 2410.23218 • Published Oct 30, 2024 • 51
PathReasoner: Modeling Reasoning Path with Equivalent Extension for Logical Question Answering Paper • 2405.19109 • Published May 29, 2024 • 2
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models Paper • 2406.11736 • Published Jun 17, 2024 • 6