ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition Paper • 2503.21248 • Published 4 days ago • 16
Mirror: A Universal Framework for Various Information Extraction Tasks Paper • 2311.05419 • Published Nov 9, 2023
Seal-Tools: Self-Instruct Tool Learning Dataset for Agent Tuning and Detailed Benchmark Paper • 2405.08355 • Published May 14, 2024
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models Paper • 2410.11805 • Published Oct 15, 2024 • 13
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models Paper • 2503.16779 • Published 10 days ago
view post Post 2737 Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning (2411.18203)Critic-V has been accepted by CVPR2025!Bonus! VRI-160K uploaded now! di-zhang-fdu/R1-Vision-Reasoning-Instructions See translation 🔥 3 3 + Reply
GameFactory: Creating New Games with Generative Interactive Videos Paper • 2501.08325 • Published Jan 14 • 65
ConceptMaster: Multi-Concept Video Customization on Diffusion Transformer Models Without Test-Time Tuning Paper • 2501.04698 • Published Jan 8 • 15
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation Paper • 2412.14642 • Published Dec 19, 2024 • 4
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation Paper • 2412.14642 • Published Dec 19, 2024 • 4