-
MLLM-as-a-Judge for Image Safety without Human Labeling
Paper • 2501.00192 • Published • 31 -
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Paper • 2501.00958 • Published • 108 -
Xmodel-2 Technical Report
Paper • 2412.19638 • Published • 27 -
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs
Paper • 2412.18925 • Published • 102
Collections
Discover the best community collections!
Collections including paper arxiv:2504.17192
-
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters
Paper • 2504.08791 • Published • 125 -
TTRL: Test-Time Reinforcement Learning
Paper • 2504.16084 • Published • 90 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 81
-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 63 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 118 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 122
-
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing
Paper • 2503.13434 • Published • 26 -
Edit Transfer: Learning Image Editing via Vision In-Context Relations
Paper • 2503.13327 • Published • 29 -
WideRange4D: Enabling High-Quality 4D Reconstruction with Wide-Range Movements and Scenes
Paper • 2503.13435 • Published • 17 -
MediaTek-Research/Llama-Breeze2-8B-Instruct
Updated • 2.04k • 35
-
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Paper • 2502.02508 • Published • 23 -
Chain of Draft: Thinking Faster by Writing Less
Paper • 2502.18600 • Published • 48 -
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
Paper • 2406.02818 • Published -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 56
-
RuCCoD: Towards Automated ICD Coding in Russian
Paper • 2502.21263 • Published • 133 -
Unified Reward Model for Multimodal Understanding and Generation
Paper • 2503.05236 • Published • 123 -
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching
Paper • 2503.05179 • Published • 46 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27
-
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
Paper • 2503.11576 • Published • 99 -
ToolRL: Reward is All Tool Learning Needs
Paper • 2504.13958 • Published • 40 -
OTC: Optimal Tool Calls via Reinforcement Learning
Paper • 2504.14870 • Published • 32
-
SurveyX: Academic Survey Automation via Large Language Models
Paper • 2502.14776 • Published • 100 -
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
Paper • 2408.06292 • Published • 125 -
Towards an AI co-scientist
Paper • 2502.18864 • Published • 49 -
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Paper • 2504.17192 • Published • 81
-
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 20 -
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving
Paper • 2502.16111 • Published • 9 -
Agent models: Internalizing Chain-of-Action Generation into Reasoning models
Paper • 2503.06580 • Published • 17 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 7