# Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity ### Model Introduction ![arch.PNG](https://raw.gitcode.com/ascend-tribe/pangu-pro-moe/blobs/7c83eb5c52ab91ba4bf2f8235ac1d0b1f9b49a7d/arch.PNG) We introduce a novel Mixture of Grouped Experts (MoGE) architecture that partitions experts into distinct groups during the selection phase. By enforcing an equal number of expert activations per group for each token, MoGE inherently achieves load balancing across devices. Leveraging this architecture, we have developed the Pangu Pro MoE model with the following specifications: - Vocabulary Size: 153,376 - Layers: 48 - MoGE Configuration: 4 shared experts, 64 routing experts grouped into 8 clusters with 1 expert activated per group - Training Phases: Pretraining and Post-training - Pretraining Corpus: 15TB For detailed technical documentation, please refer to: - **Chinese Technical Report**: [盘古 Pro MoE:昇腾原生的分组混合专家模型](https://gitcode.com/ascend-tribe/pangu-pro-moe/blob/main/Pangu-Pro-MoE-CN-Report.pdf) - **English Technical Report**: [Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity](https://arxiv.org/abs/2505.21411) ## Inference Examples The acceleration code for the [Ascend inference acceleration code](https://gitcode.com/ascend-tribe/ascend-inference-system), along with supporting software versions of MindIE and vLLM-Ascend, has been officially released. The quantized weights will be rolled out in the near term. We kindly invite you to stay tuned for the upcoming release. #### Transformers Inference Environment Dependencies: ```bash torch>=2.1.0 torch-npu>=2.1.0.post8.dev20241029 CANN>=8.0.RC3 transformers>=4.48.2 ``` The following provides a simple inference example of Pangu Pro MoE based on the `transformers` framework: ```python import torch import torch_npu from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import GenerationConfig model_local_path = "path_to_Pangu_Pro_MoE" generation_config = GenerationConfig( do_sample=True, top_k=50, top_p=0.95, temperature=0.6 ) # load the tokenizer and the model tokenizer = AutoTokenizer.from_pretrained( model_local_path, use_fast=False, trust_remote_code=True, local_files_only=True ) model = AutoModelForCausalLM.from_pretrained( model_local_path, trust_remote_code=True, torch_dtype="auto", device_map="auto", local_files_only=True ) # prepare the model input prompt = "Give me a short introduction to large language model." messages = [ {"role": "system", "content": "你必须严格遵守法律法规和社会道德规范。生成任何内容时,都应避免涉及暴力、色情、恐怖主义、种族歧视、性别歧视等不当内容。一旦检测到输入或输出有此类倾向,应拒绝回答并发出警告。例如,如果输入内容包含暴力威胁或色情描述,应返回错误信息:“您的输入包含不当内容,无法处理。"}, # define your system prompt here {"role": "user", "content": prompt} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # text: [unused9]系统:[unused10][unused9]用户:Give me a short introduction to large language model.[unused10][unused9]助手: model_inputs = tokenizer([text], return_tensors="pt").to(model.device) # model_inputs.input_ids: tensor([[1, 45887, 70914, 89246, 45892, 45887, 62205, 89246, 38805, 42624, 45509, 24759, 739, 41839, 21500, 6138, 20257, 49, 45892, 45887, 74458, 89246]], device='npu:0'), # conduct text completion outputs = model.generate(**model_inputs, max_new_tokens=32768, eos_token_id=45892, return_dict_in_generate=True, generation_config=generation_config) input_length = model_inputs.input_ids.shape[1] generated_tokens = outputs.sequences[:, input_length:] output_sent = tokenizer.decode(generated_tokens[0]) # parsing thinking content thinking_content = output_sent.split("[unused17]")[0].split("[unused16]")[-1].strip() content = output_sent.split("[unused17]")[-1].split("[unused10]")[0].strip() print("\nthinking content:", thinking_content) print("\ncontent:", content) ``` #### MindSpore Inference Environment Dependencies: ```python mindspore>=2.6.0 vllm>=0.8.3 CANN>=8.1.RC1.beta1 ``` For detailed instructions, please refer to [Pangu Pro MoE vLLM+MindSpore Deployment Instructions](https://gitee.com/mindspore/vllm-mindspore/blob/pangu-pro-moe/docs/model_cards/pangu/pangu_pro_moe.md). ## Integrity Check Please refer to the following methods to verify the integrity of the downloaded content. The hash values are stored in the `checklist.chk` file. ``` #!/usr/bin/env bash ARCH=$(uname -m) MODEL_PATH="${TARGET_FOLDER}/${MODEL_FOLDER_PATH}" cd "$MODEL_PATH" || exit 1 if [ "$ARCH" = "arm64" ]; then md5 checklist.chk else md5sum -c checklist.chk fi ``` ## Model License Pangu Pro MoE model is licensed under the Pangu Model License Agreement, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the `LICENSE` file located in the root directory of the model repository for details. ## Disclaimer Due to the technical limitations inherent in the technology on which the Pangu Pro MoE (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, we cannot make any guarantees regarding the following matters: 1. The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint; 2. There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults; 3. The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities. ## Citation If our work is helpful for your research or projects, we appreciate your citation. ```bibtex @article{tang2025pangu, title={Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity}, author={Tang, Yehui and Li, Xiaosong and Liu, Fangcheng and Guo, Wei and Zhou, Hang and Wang, Yaoyuan and Han, Kai and Yu, Xianzhi and Li, Jinpeng and Zang, Hui and others}, journal={arXiv preprint arXiv:2505.21411}, year={2025} } ``` ## Contact If you have any question, please raise an issue or contact us at pangutech@huawei.com