OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 7 days ago • 66
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published 7 days ago • 66
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published 16 days ago • 8
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning Paper • 2503.04973 • Published Mar 6 • 23
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Paper • 2408.10075 • Published Aug 19, 2024
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation Paper • 2502.20583 • Published Feb 27 • 12
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning Paper • 2502.01100 • Published Feb 3 • 17
Don't throw away your value model! Making PPO even better via Value-Guided Monte-Carlo Tree Search decoding Paper • 2309.15028 • Published Sep 26, 2023 • 1
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts Paper • 2310.02255 • Published Oct 3, 2023 • 2
Crystal: Introspective Reasoners Reinforced with Self-Feedback Paper • 2310.04921 • Published Oct 7, 2023 • 1
NaturalProofs: Mathematical Theorem Proving in Natural Language Paper • 2104.01112 • Published Mar 24, 2021
Minds versus Machines: Rethinking Entailment Verification with Language Models Paper • 2402.03686 • Published Feb 6, 2024 • 1
NaturalProver: Grounded Mathematical Proof Generation with Language Models Paper • 2205.12910 • Published May 25, 2022
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering Paper • 2210.03078 • Published Oct 6, 2022 • 1
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback Paper • 2406.09279 • Published Jun 13, 2024 • 3