SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 232
view article Article Timm ❤️ Transformers: Use any timm model with transformers By ariG23498 and 4 others • Jan 16 • 50
Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing Paper • 2502.14458 • Published Feb 20 • 2
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper • 2408.15237 • Published Aug 27, 2024 • 42
BlackMamba: Mixture of Experts for State-Space Models Paper • 2402.01771 • Published Feb 1, 2024 • 26
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry Paper • 2402.04347 • Published Feb 6, 2024 • 15
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence Paper • 2404.05892 • Published Apr 8, 2024 • 39
Mamba: Linear-Time Sequence Modeling with Selective State Spaces Paper • 2312.00752 • Published Dec 1, 2023 • 143
view article Article Bamba: Inference-Efficient Hybrid Mamba2 Model By rganti and 28 others • Dec 18, 2024 • 55
view article Article Hugging Face and JFrog partner to make AI Security more transparent By mcpotato and 1 other • Mar 4 • 21
Trained Models 🏋️ Collection They may be small, but they're training like giants! • 8 items • Updated Dec 3, 2024 • 20
Instella ✨ Collection Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. • 9 items • Updated 14 days ago • 7
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 13 items • Updated May 1 • 154