view article Article SmolLM3: smol, multilingual, long-context reasoner By loubnabnl and 22 others • 2 days ago • 412
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders By thomwolf and 1 other • 1 day ago • 331
Easy Dataset: A Unified and Extensible Framework for Synthesizing LLM Fine-Tuning Data from Unstructured Documents Paper • 2507.04009 • Published 5 days ago • 19
Should We Still Pretrain Encoders with Masked Language Modeling? Paper • 2507.00994 • Published 9 days ago • 71
Energy-Based Transformers are Scalable Learners and Thinkers Paper • 2507.02092 • Published 7 days ago • 43
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs Paper • 2507.02778 • Published 7 days ago • 9
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 By tomaarsen and 1 other • 9 days ago • 85
Is There a Case for Conversation Optimized Tokenizers in Large Language Models? Paper • 2506.18674 • Published 17 days ago • 8
Outlier-Safe Pre-Training for Robust 4-Bit Quantization of Large Language Models Paper • 2506.19697 • Published 16 days ago • 44
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test Paper • 2506.21551 • Published 14 days ago • 27
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated about 15 hours ago • 203
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper • 2506.10521 • Published 28 days ago • 70
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper • 2506.13585 • Published 24 days ago • 252
Domain2Vec: Vectorizing Datasets to Find the Optimal Data Mixture without Training Paper • 2506.10952 • Published 28 days ago • 23
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published about 1 month ago • 18
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5 • 42