Astra: Toward General-Purpose Mobile Robots via Hierarchical Multimodal Learning Paper • 2506.06205 • Published 9 days ago • 27
Just as Humans Need Vaccines, So Do Models: Model Immunization to Combat Falsehoods Paper • 2505.17870 • Published 23 days ago • 5
Cache Me if You Can: Accelerating Diffusion Models through Block Caching Paper • 2312.03209 • Published Dec 6, 2023 • 21
RealHarm: A Collection of Real-World Language Model Application Failures Paper • 2504.10277 • Published Apr 14 • 11
view article Article You could have designed state of the art positional encoding By FL33TW00D-HF • Nov 25, 2024 • 300
Min P Sampling: Balancing Creativity and Coherence at High Temperature Paper • 2407.01082 • Published Jul 1, 2024 • 1
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM By ariG23498 and 3 others • Mar 12 • 432
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning Paper • 2502.18080 • Published Feb 25 • 2
view article Article From Files to Chunks: Improving Hugging Face Storage Efficiency By jsulz and 1 other • Nov 20, 2024 • 61
The Differences Between Direct Alignment Algorithms are a Blur Paper • 2502.01237 • Published Feb 3 • 115
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge By NormalUhr • Feb 7 • 152
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate Paper • 2501.17703 • Published Jan 29 • 59
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published Jan 8 • 280
Better & Faster Large Language Models via Multi-token Prediction Paper • 2404.19737 • Published Apr 30, 2024 • 78
Training Large Language Models to Reason in a Continuous Latent Space Paper • 2412.06769 • Published Dec 9, 2024 • 86
Enhancing Training Efficiency Using Packing with Flash Attention Paper • 2407.09105 • Published Jul 12, 2024 • 15