view article Article NVIDIA's GTC 2025 Announcement for Physical AI Developers: New Open Models and Datasets 7 days ago • 29
nGPT: Normalized Transformer with Representation Learning on the Hypersphere Paper • 2410.01131 • Published Oct 1, 2024 • 10
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 146
view article Article π0 and π0-FAST: Vision-Language-Action Models for General Robot Control Feb 4 • 122
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control Paper • 2405.04798 • Published May 8, 2024 • 1
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction Paper • 2502.07316 • Published Feb 11 • 47
view article Article Introducing multi-backends (TRT-LLM, vLLM) support for Text Generation Inference Jan 16 • 71
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 208
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published Jan 28 • 27
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch Paper • 2501.18512 • Published Jan 30 • 28