A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published 3 days ago • 13
VideoScore2: Think before You Score in Generative Video Evaluation Paper • 2509.22799 • Published 9 days ago • 22
CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization Paper • 2509.21150 • Published 10 days ago • 1
Zero-Shot Long-Form Video Understanding through Screenplay Paper • 2406.17309 • Published Jun 25, 2024 • 1
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper • 2504.00824 • Published Apr 1 • 43
StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs Paper • 2505.20139 • Published May 26 • 19
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1 • 71
Human-Aligned Faithfulness in Toxicity Explanations of LLMs Paper • 2506.19113 • Published Jun 23 • 1
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models Paper • 2501.19054 • Published Jan 31 • 10
UniPredict: Large Language Models are Universal Tabular Classifiers Paper • 2310.03266 • Published Oct 5, 2023
Large Language Models are Fixated by Red Herrings: Exploring Creative Problem Solving and Einstellung Effect using the Only Connect Wall Dataset Paper • 2306.11167 • Published Jun 19, 2023 • 2
CLEVR Parser: A Graph Parser Library for Geometric Learning on Language Grounded Image Scenes Paper • 2009.09154 • Published Sep 19, 2020
Adaptive Shells for Efficient Neural Radiance Field Rendering Paper • 2311.10091 • Published Nov 16, 2023 • 20