Learning Explainable Dense Reward Shapes via Bayesian Optimization Paper • 2504.16272 • Published 11 days ago • 5
LawFlow : Collecting and Simulating Lawyers' Thought Processes Paper • 2504.18942 • Published 7 days ago • 4
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 5 days ago • 32
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 5 days ago • 32
ScholaWrite: A Dataset of End-to-End Scholarly Writing Process Paper • 2502.02904 • Published Feb 5 • 2
A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications Paper • 1804.09635 • Published Apr 25, 2018
Read, Revise, Repeat: A System Demonstration for Human-in-the-loop Iterative Text Revision Paper • 2204.03685 • Published Apr 7, 2022
CoEdIT: Text Editing by Task-Specific Instruction Tuning Paper • 2305.09857 • Published May 17, 2023 • 7
Prefer to Classify: Improving Text Classifiers via Auxiliary Preference Learning Paper • 2306.04925 • Published Jun 8, 2023
Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks Paper • 2212.01350 • Published Dec 2, 2022 • 1
Benchmarking Cognitive Biases in Large Language Models as Evaluators Paper • 2309.17012 • Published Sep 29, 2023 • 3
Under the Surface: Tracking the Artifactuality of LLM-Generated Data Paper • 2401.14698 • Published Jan 26, 2024
SelectLLM: Can LLMs Select Important Instructions to Annotate? Paper • 2401.16553 • Published Jan 29, 2024 • 3
Chain-of-Instructions: Compositional Instruction Tuning on Large Language Models Paper • 2402.11532 • Published Feb 18, 2024 • 1
i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment Paper • 2406.11280 • Published Jun 17, 2024
Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback Paper • 2402.03746 • Published Feb 6, 2024