MoBE: Mixture-of-Basis-Experts for Compressing MoE-based LLMs Paper • 2508.05257 • Published 7 days ago • 8
view article Article Accelerate ND-Parallel: A Guide to Efficient Multi-GPU Training By siro1 and 4 others • 6 days ago • 43
Tanuki-8B Collection Llama-3-8B 類似アーキテクチャの日本語フルスクラッチLLM(NEDO承認後に公開予定) • 4 items • Updated Jun 12, 2024 • 3