# Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity

### Model Introduction

![arch.PNG](https://raw.gitcode.com/ascend-tribe/pangu-pro-moe/blobs/7c83eb5c52ab91ba4bf2f8235ac1d0b1f9b49a7d/arch.PNG)

We introduce a novel Mixture of Grouped Experts (MoGE) architecture that partitions experts into distinct groups during the selection phase. By enforcing an equal number of expert activations per group for each token, MoGE inherently achieves load balancing across devices. Leveraging this architecture, we have developed the Pangu Pro MoE model with the following specifications:

- Vocabulary Size: 153,376
- Layers: 48
- MoGE Configuration: 4 shared experts, 64 routing experts grouped into 8 clusters with 1 expert activated per group
- Training Phases: Pretraining and Post-training
- Pretraining Corpus: 15TB

For detailed technical documentation, please refer to:

- **Chinese Technical Report**: [盘古 Pro MoE：昇腾原生的分组混合专家模型](https://gitcode.com/ascend-tribe/pangu-pro-moe/blob/main/Pangu-Pro-MoE-CN-Report.pdf)
- **English Technical Report**: [Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity](https://arxiv.org/abs/2505.21411)


## Inference Examples

The acceleration code for the [Ascend inference acceleration code](https://gitcode.com/ascend-tribe/ascend-inference-system), along with supporting software versions of MindIE and vLLM-Ascend, has been officially released. The quantized weights will be rolled out in the near term. We kindly invite you to stay tuned for the upcoming release.

#### Transformers Inference

Environment Dependencies：

```bash
torch>=2.1.0
torch-npu>=2.1.0.post8.dev20241029
CANN>=8.0.RC3
transformers>=4.48.2
```

The following provides a simple inference example of Pangu Pro MoE based on the `transformers` framework:

```python
import torch
import torch_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import GenerationConfig

model_local_path = "path_to_Pangu_Pro_MoE"

generation_config = GenerationConfig(
    do_sample=True,
    top_k=50,
    top_p=0.95,
    temperature=0.6
)

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(
    model_local_path, 
    use_fast=False, 
    trust_remote_code=True,
    local_files_only=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_local_path,
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
    local_files_only=True
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "你必须严格遵守法律法规和社会道德规范。生成任何内容时，都应避免涉及暴力、色情、恐怖主义、种族歧视、性别歧视等不当内容。一旦检测到输入或输出有此类倾向，应拒绝回答并发出警告。例如，如果输入内容包含暴力威胁或色情描述，应返回错误信息：“您的输入包含不当内容，无法处理。"}, # define your system prompt here
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# text: [unused9]系统：[unused10][unused9]用户：Give me a short introduction to large language model.[unused10][unused9]助手：
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# model_inputs.input_ids: tensor([[1, 45887, 70914, 89246, 45892, 45887, 62205, 89246, 38805, 42624, 45509, 24759, 739, 41839, 21500, 6138, 20257, 49, 45892, 45887, 74458, 89246]], device='npu:0'),

# conduct text completion
outputs = model.generate(**model_inputs, max_new_tokens=32768, eos_token_id=45892, return_dict_in_generate=True, generation_config=generation_config)

input_length = model_inputs.input_ids.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
output_sent = tokenizer.decode(generated_tokens[0])

# parsing thinking content
thinking_content = output_sent.split("[unused17]")[0].split("[unused16]")[-1].strip()
content = output_sent.split("[unused17]")[-1].split("[unused10]")[0].strip()

print("\nthinking content:", thinking_content)
print("\ncontent:", content)
```

#### MindSpore Inference

Environment Dependencies：

```python
mindspore>=2.6.0
vllm>=0.8.3
CANN>=8.1.RC1.beta1
```

For detailed instructions, please refer to [Pangu Pro MoE vLLM+MindSpore Deployment Instructions](https://gitee.com/mindspore/vllm-mindspore/blob/pangu-pro-moe/docs/model_cards/pangu/pangu_pro_moe.md).

## Integrity Check

Please refer to the following methods to verify the integrity of the downloaded content. The hash values are stored in the `checklist.chk` file.

```
#!/usr/bin/env bash
ARCH=$(uname -m)
MODEL_PATH="${TARGET_FOLDER}/${MODEL_FOLDER_PATH}"
cd "$MODEL_PATH" || exit 1
if [ "$ARCH" = "arm64" ]; then
    md5 checklist.chk
else
    md5sum -c checklist.chk
fi
```

## Model License

Pangu Pro MoE model is licensed under the Pangu Model License Agreement, which is intended to be used permissively and enable the further development of artificial intelligence technologies. Please refer to the `LICENSE` file located in the root directory of the model repository for details.

## Disclaimer

Due to the technical limitations inherent in the technology on which the Pangu Pro MoE (“Model”) relies and the fact that the artificial intelligence generated content is automatically produced by Model, we cannot make any guarantees regarding the following matters:

1. The output of this Model is automatically generated via AI algorithms, it does not rule out the possibility that some of the information may be flawed, unreasonable, or cause discomfort, and the generated content does not represent Huawei's attitude or standpoint;
2. There is no guarantee that this Model is 100% accurate, reliable, functional, timely, secure and safety, error-free, uninterrupted, continuously stable, or free of any faults;
3.  The output of this Model does not constitute any advices or decisions for you, and it does not guarantee the authenticity, completeness, accuracy, timeliness, legality, functionality, or practicality of the generated content. The generated content cannot replace professionals in medical, legal, and other fields in answering your questions. The generated content is for your reference only and does not represent any attitude, standpoint, or position of Huawei. You need to make independent judgments based on your actual situation, and Huawei does not assume any responsibilities.

## Citation

If our work is helpful for your research or projects, we appreciate your citation.

```bibtex
@article{tang2025pangu,
  title={Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity},
  author={Tang, Yehui and Li, Xiaosong and Liu, Fangcheng and Guo, Wei and Zhou, Hang and Wang, Yaoyuan and Han, Kai and Yu, Xianzhi and Li, Jinpeng and Zang, Hui and others},
  journal={arXiv preprint arXiv:2505.21411},
  year={2025}
}
```

## Contact

If you have any question, please raise an issue or contact us at pangutech@huawei.com