alves's picture

alves

alvesrt

·

AI & ML interests

None yet

Organizations

alvesrt's activity

upvoted a paper 3 months ago

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

Paper • 2502.02737 • Published Feb 4 • 232

upvoted an article 3 months ago

Article

Timm ❤️ Transformers: Use any timm model with transformers

By

and 4 others •

Jan 16

• 50

upvoted 9 papers 3 months ago

Llamba: Scaling Distilled Recurrent Models for Efficient Language Processing

Paper • 2502.14458 • Published Feb 20 • 2

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 42

MambaByte: Token-free Selective State Space Model

Paper • 2401.13660 • Published Jan 24, 2024 • 59

Diffusion Models Without Attention

Paper • 2311.18257 • Published Nov 30, 2023 • 3

BlackMamba: Mixture of Experts for State-Space Models

Paper • 2402.01771 • Published Feb 1, 2024 • 26

Gated recurrent neural networks discover attention

Paper • 2309.01775 • Published Sep 4, 2023 • 10

RWKV: Reinventing RNNs for the Transformer Era

Paper • 2305.13048 • Published May 22, 2023 • 19

The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry

Paper • 2402.04347 • Published Feb 6, 2024 • 15

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 39

upvoted a collection 3 months ago

fuck quadratic attention

11 items • Updated Apr 24, 2024 • 24

upvoted a paper 3 months ago

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Paper • 2312.00752 • Published Dec 1, 2023 • 143

upvoted 2 articles 3 months ago

Article

Bamba: Inference-Efficient Hybrid Mamba2 Model

By

and 28 others •

Dec 18, 2024

• 55

Article

Hugging Face and JFrog partner to make AI Security more transparent

By

and 1 other •

Mar 4

• 21

upvoted 3 collections 3 months ago

Trained Models 🏋️

They may be small, but they're training like giants! • 8 items • Updated Dec 3, 2024 • 20

Instella ✨

Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. • 9 items • Updated 14 days ago • 7

Phi-4

Phi-4 family of small language, multi-modal and reasoning models. • 13 items • Updated May 1 • 154