PotentialApplication - a floom Collection

floom 's Collections

PotentialApplication

ShowAndTell-2025-01-30

ShowAndTell-2024-12-03

Coding

ICL

RL

Agents

NLU

RAG

Data Efficient Approaches

Personalization

sentence-transformer-models

Tool Use & more

Feedback Analysis

Memory

SSM

Efficient Serving/Inference

Synthetic Data Generation

Frontier research ideas

PotentialApplication

updated about 12 hours ago

Let LLMs Break Free from Overthinking via Self-Braking Tuning

Paper • 2505.14604 • Published May 20 • 23
AGENTIF: Benchmarking Instruction Following of Large Language Models in Agentic Scenarios

Paper • 2505.16944 • Published May 22 • 8
Training Step-Level Reasoning Verifiers with Formal Verification Tools

Paper • 2505.15960 • Published May 21 • 7
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

Paper • 2505.15134 • Published May 21 • 6
Reward Reasoning Model

Paper • 2505.14674 • Published May 20 • 37
General-Reasoner: Advancing LLM Reasoning Across All Domains

Paper • 2505.14652 • Published May 20 • 23
Fine-tuning Quantized Neural Networks with Zeroth-order Optimization

Paper • 2505.13430 • Published May 19 • 10
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training

Paper • 2505.14681 • Published May 20 • 9
The Hallucination Tax of Reinforcement Finetuning

Paper • 2505.13988 • Published May 20 • 8
TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations

Paper • 2505.18125 • Published May 23 • 113
QwenLong-CPRS: Towards infty-LLMs with Dynamic Context Optimization

Paper • 2505.18092 • Published May 23 • 44
Synthetic Data RL: Task Definition Is All You Need

Paper • 2505.17063 • Published May 18 • 10
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning

Paper • 2505.16022 • Published May 21 • 3
ARM: Adaptive Reasoning Model

Paper • 2505.20258 • Published May 26 • 45
Learning to Reason without External Rewards

Paper • 2505.19590 • Published May 26 • 29
Interleaved Reasoning for Large Language Models via Reinforcement Learning

Paper • 2505.19640 • Published May 26 • 13
Rethinking the Sampling Criteria in Reinforcement Learning for LLM Reasoning: A Competence-Difficulty Alignment Perspective

Paper • 2505.17652 • Published May 23 • 6
UFT: Unifying Supervised and Reinforcement Fine-Tuning

Paper • 2505.16984 • Published May 22 • 3
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs

Paper • 2505.19075 • Published May 25 • 21
Reinforcing Multi-Turn Reasoning in LLM Agents via Turn-Level Credit Assignment

Paper • 2505.11821 • Published May 17 • 14
Text2Grad: Reinforcement Learning from Natural Language Feedback

Paper • 2505.22338 • Published May 28 • 8
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

Paper • 2505.22617 • Published May 28 • 129
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Paper • 2506.03295 • Published Jun 3 • 17
ConfQA: Answer Only If You Are Confident

Paper • 2506.07309 • Published Jun 8 • 10
ExpertLongBench: Benchmarking Language Models on Expert-Level Long-Form Generation Tasks with Structured Checklists

Paper • 2506.01241 • Published Jun 2 • 9
Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward

Paper • 2506.05433 • Published Jun 5 • 4
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling

Paper • 2506.08672 • Published Jun 10 • 31
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning

Paper • 2505.24726 • Published May 30 • 269
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning

Paper • 2504.21370 • Published Apr 30 • 1
Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy

Paper • 2507.01352 • Published Jul 2 • 53
Stabilizing Knowledge, Promoting Reasoning: Dual-Token Constraints for RLVR

Paper • 2507.15778 • Published about 1 month ago • 19
Sculptor: Empowering LLMs with Cognitive Agency via Active Context Management

Paper • 2508.04664 • Published 14 days ago • 12
Can LLM-Generated Textual Explanations Enhance Model Classification Performance? An Empirical Study

Paper • 2508.09776 • Published 7 days ago • 3
Aryabhata: An exam-focused language model for JEE Math

Paper • 2508.08665 • Published 9 days ago • 16
Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published 6 days ago • 24
A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic Systems

Paper • 2508.07407 • Published 10 days ago • 82
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning

Paper • 2508.08221 • Published 9 days ago • 39
Prompt Orchestration Markup Language

Paper • 2508.13948 • Published 1 day ago • 23