Inference-Time Scaling for Generalist Reward Modeling Paper • 2504.02495 • Published Apr 3 • 54
DNA-R1 Collection Reasoning model distilled from DeepSeek-R1, enhanced with GRPO using supplementary reasoning datasets. • 1 item • Updated 6 days ago • 2
🧠Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 21 items • Updated 20 days ago • 135
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143