Testerpce
's Collections
Reasoning
updated
Contrastive Decoding Improves Reasoning in Large Language Models
Paper
•
2309.09117
•
Published
•
39
Prometheus: Inducing Fine-grained Evaluation Capability in Language
Models
Paper
•
2310.08491
•
Published
•
55
Language Models are Hidden Reasoners: Unlocking Latent Reasoning
Capabilities via Self-Rewarding
Paper
•
2411.04282
•
Published
•
36
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large
Language Models
Paper
•
2411.14432
•
Published
•
26
Ensembling Large Language Models with Process Reward-Guided Tree Search
for Better Complex Reasoning
Paper
•
2412.15797
•
Published
•
18
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
•
2501.05366
•
Published
•
102
Evolving Deeper LLM Thinking
Paper
•
2501.09891
•
Published
•
114
Token Assorted: Mixing Latent and Text Tokens for Improved Language
Model Reasoning
Paper
•
2502.03275
•
Published
•
17
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
•
2502.05171
•
Published
•
140
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual
Reasoning in Mathematical LLMs
Paper
•
2502.10454
•
Published
•
7
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
•
2502.10458
•
Published
•
35
Entropy-Regularized Process Reward Model
Paper
•
2412.11006
•
Published
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings
from MCTS-Boosted Mathematical Reasoning
Paper
•
2412.15904
•
Published
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
•
2503.05592
•
Published
•
27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
•
2503.10639
•
Published
•
50
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based
VLM Agent Training
Paper
•
2503.08525
•
Published
•
17
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
•
2503.17352
•
Published
•
23
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
•
2503.24235
•
Published
•
53
Exploring Data Scaling Trends and Effects in Reinforcement Learning from
Human Feedback
Paper
•
2503.22230
•
Published
•
44
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
•
2503.24290
•
Published
•
62
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
•
2504.05118
•
Published
•
25
FlowReasoner: Reinforcing Query-Level Meta-Agents
Paper
•
2504.15257
•
Published
•
43