Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23 • 78
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11 • 56
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • May 12 • 468
Running 2.74k 2.74k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
microsoft/Phi-3.5-vision-instruct Image-Text-to-Text • 4B • Updated Sep 26, 2024 • 1.01M • 692