view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +5 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra • 3 days ago • 31
The MiniMax-M2 Series: Mini Activations Unleashing Max Real-World Intelligence Paper • 2605.26494 • Published 4 days ago • 32
Look Before You Leap: Autonomous Exploration for LLM Agents Paper • 2605.16143 • Published 15 days ago • 9
📊 DNA benchmarks Collection Zero-shot DNA benchmarks for Variant Effect prediction, Sequence Recovery and Perturbation tasks. • 5 items • Updated 11 days ago • 9
Laguna XS.2 Collection Designed for agentic coding and long-horizon work on a local machine. Apache 2.0. • 5 items • Updated 23 days ago • 23
NVIDIA Nemotron v3 Collection Open, Production-ready Enterprise Models • 18 items • Updated about 16 hours ago • 298
ISO-Bench: Can Coding Agents Optimize Real-World Inference Workloads? Paper • 2602.19594 • Published Feb 23 • 3
Structured Distillation of Web Agent Capabilities Enables Generalization Paper • 2604.07776 • Published Apr 9 • 23
Locate, Steer, and Improve: A Practical Survey of Actionable Mechanistic Interpretability in Large Language Models Paper • 2601.14004 • Published Jan 20 • 48
💧 LFM2.5 Collection Collection of post-trained and base LFM2.5 models. • 33 items • Updated 2 days ago • 142
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published Apr 27 • 24
Crystalite: A Lightweight Transformer for Efficient Crystal Modeling Paper • 2604.02270 • Published Apr 2 • 1
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills Paper • 2603.25158 • Published Mar 26 • 53
view article Article SynthVision: Building a 110K Synthetic Medical VQA Dataset with Cross-Model Validation OpenMed • Mar 23 • 17
Self-Improving Pretraining: using post-trained models to pretrain better models Paper • 2601.21343 • Published Jan 29 • 19