Skywork-Reward-V2: Scaling Preference Data Curation via Human-AI Synergy Paper • 2507.01352 • Published 2 days ago • 9
Skywork-Reward-V2 Collection Scaling preference data curation to the extreme • 9 items • Updated about 3 hours ago • 6
Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning Paper • 2507.00432 • Published 3 days ago • 44
Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity Paper • 2505.21411 • Published May 27 • 15
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15 • 18
Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs Paper • 2506.14245 • Published 17 days ago • 37
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning Paper • 2506.08989 • Published 23 days ago • 14
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 165
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published 28 days ago • 62
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published May 30 • 132
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 18
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published Apr 25 • 45
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published Apr 15 • 60
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning Paper • 2504.11456 • Published Apr 15 • 13