view changelog Changelog Introducing HF Jobs: Run scalable compute jobs on Hugging Face 13 days ago ⢠90
view article Article A failed experiment: Infini-Attention, and why we should keep trying? By neuralink and 2 others ⢠Aug 14, 2024 ⢠69
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper ⢠2502.11089 ⢠Published Feb 16 ⢠165
view article Article OpenReasoning-Nemotron: A Family of State-of-the-Art Distilled Reasoning Models By nvidia and 3 others ⢠25 days ago ⢠47
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper ⢠2506.19852 ⢠Published Jun 24 ⢠41
view article Article Open-source DeepResearch ā Freeing our search agents By m-ric and 4 others ⢠Feb 4 ⢠1.28k
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper ⢠2502.07316 ⢠Published Feb 11 ⢠50
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper ⢠2502.05171 ⢠Published Feb 7 ⢠147
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper ⢠2501.18512 ⢠Published Jan 30 ⢠30
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others ⢠Jan 28 ⢠877
view article Article Welcome to Inference Providers on the Hub š„ By julien-c and 6 others ⢠Jan 28 ⢠487
Structured 3D Latents for Scalable and Versatile 3D Generation Paper ⢠2412.01506 ⢠Published Dec 2, 2024 ⢠80
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes Paper ⢠2306.13649 ⢠Published Jun 23, 2023 ⢠22
Cautious Optimizers: Improving Training with One Line of Code Paper ⢠2411.16085 ⢠Published Nov 25, 2024 ⢠21
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency Paper ⢠2409.02634 ⢠Published Sep 4, 2024 ⢠98
Memory-Efficient LLM Training with Online Subspace Descent Paper ⢠2408.12857 ⢠Published Aug 23, 2024 ⢠14
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community By Leyo and 2 others ⢠Apr 15, 2024 ⢠185
Longhorn: State Space Models are Amortized Online Learners Paper ⢠2407.14207 ⢠Published Jul 19, 2024 ⢠18