Learning Explainable Dense Reward Shapes via Bayesian Optimization Paper • 2504.16272 • Published 11 days ago • 5
LawFlow : Collecting and Simulating Lawyers' Thought Processes Paper • 2504.18942 • Published 7 days ago • 4
Toward Evaluative Thinking: Meta Policy Optimization with Evolving Reward Models Paper • 2504.20157 • Published 5 days ago • 32
ScholaWrite: A Dataset of End-to-End Scholarly Writing Process Paper • 2502.02904 • Published Feb 5 • 2
CoEdIT Collection Collection of the publicly available CoEdIT dataset and instruction-tuned models for text editing. • 6 items • Updated Apr 15, 2024 • 6
CoEdIT: Text Editing by Task-Specific Instruction Tuning Paper • 2305.09857 • Published May 17, 2023 • 7