Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 129
cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning Paper • 2505.22914 • Published May 28 • 35
view article Article ⚗️ 🔥 Building High-Quality Datasets with distilabel and Prometheus 2 By burtenshaw • Jun 3, 2024 • 27