--- license: apache-2.0 datasets: - liuhaotian/LLaVA-Instruct-150K language: - en base_model: - microsoft/Phi-3.5-mini-instruct pipeline_tag: text-generation --- 🎉 CompeteSMoE-5.1B CompeteSMoE-5.1B is a lightweight and integrated variant of the Mixture-of-Experts (MoE) architecture, built upon the Phi-3.5 Mini and SigLIP baselines. This version incorporates the latest CompeteSMoE algorithm enhancements. CompeteSMoE-5.1B demonstrates strong performance across a range of MoE routing strategies, including both standard and star-to-art routing methods. It achieves competitive results compared to recent MoE architectures, such as SharedE-V2 and SharedE-V3, which are inspired by DeepSeek. Despite the architectural innovations of these models especially their use of shared experts CompeteSMoE-5.1B consistently delivers superior or comparable results. 📝 Note: This version of CompeteSMoE-5.1B was trained on a small-scale dataset. 🚧 We're actively working on a stronger, more robust release — coming soon! 🚀 Stay tuned for updates. 💡 ### Hardware Resources | Stage | MoE Method | Hardware | |-------------------|----------------------|-----------| | Pre-Training | | 4xH100 | | Pre-FineTuning | | 4xH100 | | VIT | CompeteSMoE | 4xH100 | --- ### Citation Information More details can be found in our paper. If you use CompeteSMoE, please cite it using this BibTeX: ``` @misc{nguyen2025competesmoe, title={CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition}, author={Nam V. Nguyen and Huy Nguyen and Quang Pham and Van Nguyen and Savitha Ramasamy and Nhat Ho}, year={2025}, eprint={2505.13380}, archivePrefix={arXiv}, primaryClass={cs.AI} } ```