Effective Distillation to Hybrid xLSTM Architectures Paper • 2603.15590 • Published 2 days ago • 31 • 5
TildeOpen LLM: Leveraging Curriculum Learning to Achieve Equitable Language Representation Paper • 2603.08182 • Published 10 days ago • 1
Data Repetition Beats Data Scaling in Long-CoT Supervised Fine-Tuning Paper • 2602.11149 • Published Feb 11 • 15 • 4
FineInstructions: Scaling Synthetic Instructions to Pre-Training Scale Paper • 2601.22146 • Published Jan 29 • 9 • 5
Bolmo: Byteifying the Next Generation of Language Models Paper • 2512.15586 • Published Dec 17, 2025 • 17 • 3
Beyond URLs: Metadata Diversity and Position for Efficient LLM Pretraining Paper • 2511.21613 • Published Nov 26, 2025 • 2 • 1
Gaperon: A Peppered English-French Generative Language Model Suite Paper • 2510.25771 • Published Oct 29, 2025 • 16 • 2
Mask and You Shall Receive: Optimizing Masked Language Modeling For Pretraining BabyLMs Paper • 2510.20475 • Published Oct 23, 2025 • 1 • 2
The Art of Asking: Multilingual Prompt Optimization for Synthetic Data Paper • 2510.19806 • Published Oct 22, 2025 • 1 • 1
Empirical Evaluation of Knowledge Distillation from Transformers to Subquadratic Language Models Paper • 2504.14366 • Published Apr 19, 2025 • 1 • 1
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9 • 2