Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published 5 days ago • 129
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models Paper • 2502.09604 • Published 8 days ago • 29
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU Paper • 2502.08910 • Published 9 days ago • 139
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published 17 days ago • 187
Transformers Can Navigate Mazes With Multi-Step Prediction Paper • 2412.05117 • Published Dec 6, 2024 • 5
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others • Nov 13, 2024 • 99
view article Article SauerkrautLM's Multi-Phase Spectrum Training: A Technical Deep Dive By DavidGF • Nov 9, 2024 • 9
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization Paper • 2411.02355 • Published Nov 4, 2024 • 48
Fast Matrix Multiplications for Lookup Table-Quantized LLMs Paper • 2407.10960 • Published Jul 15, 2024 • 12
Strong German fp8 LLM's Collection Strong Large Language Models for the german language in fp8 format • 6 items • Updated Sep 24, 2024 • 3
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform Paper • 2405.03003 • Published May 5, 2024 • 8
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11, 2024 • 32