CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization Paper β’ 2507.06181 β’ Published 12 days ago β’ 39
Configurable Preference Tuning βοΈπ Collection CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts β’ 7 items β’ Updated Jun 17 β’ 1
Configurable Preference Tuning with Rubric-Guided Synthetic Data Paper β’ 2506.11702 β’ Published Jun 13 β’ 2
Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit Paper β’ 2506.06607 β’ Published Jun 7 β’ 2
Atropos Artifacts Collection A collection of experimental artifacts created with Atropos, Nous' RL Environments framework - https://github.com/NousResearch/Atropos β’ 9 items β’ Updated May 26 β’ 10
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper β’ 2504.20571 β’ Published Apr 29 β’ 97
Perception Encoder: The best visual embeddings are not at the output of the network Paper β’ 2504.13181 β’ Published Apr 17 β’ 34
ReZero: Enhancing LLM search ability by trying one-more-time Paper β’ 2504.11001 β’ Published Apr 15 β’ 15
view article Article Custom Vibe Coding Quest Part 1: The Quest Begins π§ By burtenshaw β’ Mar 26 β’ 9
view article Article Custom Vibe Coding Quest Part 2: π Fine-Tuning Gemma 3 for Code Reasoning By burtenshaw β’ Apr 1 β’ 25
DeepHermes Collection Preview models of hybrid reasoner Hermes series β’ 6 items β’ Updated Mar 13 β’ 39
DPO Collection Various useful datasets with preference optimization β’ 18 items β’ Updated Jun 9 β’ 5
MetaSC: Test-Time Safety Specification Optimization for Language Models Paper β’ 2502.07985 β’ Published Feb 11 β’ 3
Toxic Commons Collection Tools for de-toxifying public domain data, especially multilingual and historical text data and data with OCR errors. β’ 3 items β’ Updated Oct 31, 2024 β’ 6