M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models Paper • 2504.10449 • Published Apr 14 • 14
nvidia/Llama-3.1-Nemotron-8B-UltraLong-2M-Instruct Text Generation • 8B • Updated Apr 17 • 1.12k • 15
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 60
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published Feb 13 • 149
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning Paper • 2505.01441 • Published Apr 28 • 39
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12 • 30
MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining Paper • 2505.07608 • Published May 12 • 81
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in Large Language Models Paper • 2505.02847 • Published May 1 • 28
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7 • 27
LLM-Independent Adaptive RAG: Let the Question Speak for Itself Paper • 2505.04253 • Published May 7 • 13
ZeroSearch: Incentivize the Search Capability of LLMs without Searching Paper • 2505.04588 • Published May 7 • 66
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 178
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL Paper • 2505.02391 • Published May 5 • 25
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published May 5 • 32
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 94
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper • 2504.21117 • Published Apr 29 • 26
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes Paper • 2504.11544 • Published Apr 15 • 42
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper • 2504.13816 • Published Apr 18 • 17
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large Reasoning Models Paper • 2505.10554 • Published May 15 • 120
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models Paper • 2504.16074 • Published Apr 22 • 36
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents Paper • 2504.15785 • Published Apr 22 • 19
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning Paper • 2505.08617 • Published May 13 • 42
Improving Assembly Code Performance with Large Language Models via Reinforcement Learning Paper • 2505.11480 • Published May 16 • 8
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via texttt{D}ual-texttt{H}ead texttt{O}ptimization Paper • 2505.07675 • Published May 12 • 20
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning Paper • 2505.13866 • Published May 20 • 16
ZeroGUI: Automating Online GUI Learning at Zero Human Cost Paper • 2505.23762 • Published May 29 • 46
rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset Paper • 2505.21297 • Published May 27 • 30
DiffDecompose: Layer-Wise Decomposition of Alpha-Composited Images via Diffusion Transformers Paper • 2505.21541 • Published May 24 • 7
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published May 29 • 23
Interleaved Reasoning for Large Language Models via Reinforcement Learning Paper • 2505.19640 • Published May 26 • 13
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published May 22 • 65
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 97
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published Jun 5 • 24
Runtime error 15 15 Leaderboard: Physical Reasoning from Video 🏃 Submit and score model predictions for video and text tasks
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 128
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis Paper • 2506.06276 • Published Jun 6 • 22
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation Paper • 2506.09790 • Published Jun 11 • 52
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs Paper • 2506.14731 • Published Jun 17 • 9
Marrying Autoregressive Transformer and Diffusion with Multi-Reference Autoregression Paper • 2506.09482 • Published Jun 11 • 46
ViCrit: A Verifiable Reinforcement Learning Proxy Task for Visual Perception in VLMs Paper • 2506.10128 • Published Jun 11 • 23
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence Paper • 2506.15677 • Published about 1 month ago • 24
LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs Paper • 2506.14429 • Published Jun 17 • 44
Show-o2: Improved Native Unified Multimodal Models Paper • 2506.15564 • Published about 1 month ago • 29
mistralai/Mistral-Small-3.2-24B-Instruct-2506 Image-Text-to-Text • 24B • Updated 11 days ago • 126k • 367
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models Paper • 2506.14435 • Published Jun 17 • 8
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation Paper • 2506.18349 • Published 26 days ago • 12
Arch-Router: Aligning LLM Routing with Human Preferences Paper • 2506.16655 • Published 29 days ago • 10
SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning Paper • 2506.19767 • Published 24 days ago • 13
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published 25 days ago • 56
ReCode: Updating Code API Knowledge with Reinforcement Learning Paper • 2506.20495 • Published 23 days ago • 8
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 24 days ago • 44
Chain-of-Experts: Unlocking the Communication Power of Mixture-of-Experts Models Paper • 2506.18945 • Published 26 days ago • 39
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 18 days ago • 68
GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning Paper • 2507.01006 • Published 17 days ago • 188
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published 22 days ago • 15
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence with Spatial Reasoning and Understanding Paper • 2506.23219 • Published 20 days ago • 7
Teaching a Language Model to Speak the Language of Tools Paper • 2506.23394 • Published 19 days ago • 4
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation Paper • 2507.02608 • Published 16 days ago • 21
DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation Paper • 2506.20639 • Published 23 days ago • 27
Rethinking Verification for LLM Code Generation: From Generation to Testing Paper • 2507.06920 • Published 9 days ago • 28
VLM2Vec-V2: Advancing Multimodal Embedding for Videos, Images, and Visual Documents Paper • 2507.04590 • Published 12 days ago • 16
Robust Multimodal Large Language Models Against Modality Conflict Paper • 2507.07151 • Published 10 days ago • 5
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published 4 days ago • 51
BlockFFN: Towards End-Side Acceleration-Friendly Mixture-of-Experts with Chunk-Level Activation Sparsity Paper • 2507.08771 • Published 7 days ago • 7
A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning Paper • 2507.08267 • Published 8 days ago • 8
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published 7 days ago • 37
EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes Paper • 2507.11407 • Published 3 days ago • 40
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs Paper • 2507.09477 • Published 6 days ago • 59