Upload README.md with huggingface_hub
Browse files
README.md
ADDED
@@ -0,0 +1,306 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
|
2 |
+
<p align="center">
|
3 |
+
<picture>
|
4 |
+
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
|
5 |
+
<img alt="AngelSlim" src="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo.png?raw=true" width=55%>
|
6 |
+
</picture>
|
7 |
+
</p>
|
8 |
+
|
9 |
+
<h3 align="center">
|
10 |
+
Dedicated to building a more intuitive, comprehensive, and efficient LLMs compression toolkit.
|
11 |
+
</h3>
|
12 |
+
|
13 |
+
<p align="center">
|
14 |
+
๐ <a href="https://angelslim.readthedocs.io/">Documentation</a>   |   ๐ค <a href="https://huggingface.co/AngelSlim">Hugging Face</a>   |   ๐ค <a href="https://modelscope.cn/organization/AngelSlim">ModelScope</a>   |   ๐ฌ <a href="./docs/source/assets/angel_slim_wechat.png">WeChat</a>
|
15 |
+
<br>
|
16 |
+
</p>
|
17 |
+
|
18 |
+
|
19 |
+
## Table of Contents
|
20 |
+
|
21 |
+
- [Latest Updates](#latest-updates)
|
22 |
+
- [Key Features](#key-features)
|
23 |
+
- [Supported Models](#supported-models)
|
24 |
+
- [How to Use](#how-to-use)
|
25 |
+
- [Install AngelSlim](#install-angelslim)
|
26 |
+
- [Quick Start](#quick-start)
|
27 |
+
- [deployment & Evaluation](#deployment)
|
28 |
+
- [Benchmark](#benchmark)
|
29 |
+
- [License](#license)
|
30 |
+
- [Citation](#citation)
|
31 |
+
- [Technical Discussion](#technical-discussion)
|
32 |
+
|
33 |
+
## ๐ฃLatest Updates
|
34 |
+
|
35 |
+
- [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
|
36 |
+
We also opensource Qwen3-8B`s Eagle3 model weight.
|
37 |
+
|
38 |
+
Coming soon:
|
39 |
+
|
40 |
+
- [ ] Support W4A8 quantization for DeepSeek-R1.
|
41 |
+
- [ ] Support quantization for multimodal models like Qwen-VL.
|
42 |
+
- [ ] Release of new algorithm for speculative sampling.
|
43 |
+
|
44 |
+
## ๐Key Features
|
45 |
+
|
46 |
+
- **Highly Integrated**: This toolkit integrates mainstream compression algorithms into a unified framework, offering developers one-click access with exceptional ease of use.
|
47 |
+
- **Continuous Innovation**: Beyond integrating widely-used industry algorithms, we are continuously researching better compression algorithms, which will be gradually open-sourced in the future.
|
48 |
+
- **Performance-Driven**: We continuously optimize end-to-end performance in model compression workflows and algorithm deployment, such as enabling quantization of models like Qwen3-235B and DeepSeek-R1 on a single GPU.
|
49 |
+
|
50 |
+
## ๐ผSupported Models
|
51 |
+
|
52 |
+
### Quantization
|
53 |
+
Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwen3-Dense, Qwen3-MoE, Qwen2.5, DeepSeek-R1 distilled Qwen models, and QwQ::
|
54 |
+
|
55 |
+
| Model | FP8-Dynamic | FP8-Static | INT8-Dynamic | INT4-GPTQ | INT4-AWQ |
|
56 |
+
| --------------------------------------------------------------------------------------------------------------------------- | ----------- | ---------- | ------------ | --------- | -------- |
|
57 |
+
| [Hunyuan-Dense](https://huggingface.co/tencent/Hunyuan-7B-Instruct) | โ
| โ
| โ
| โ
| โ
|
|
58 |
+
| [Hunyuan-MoE](https://huggingface.co/collections/tencent/hunyuan-a13b-685ec38e5b46321e3ea7c4be) | โ
| โ
| โ
| โ
| โ
|
|
59 |
+
| [Qwen3-Dense](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | โ
| โ
| โ
| โ
| โ
|
|
60 |
+
| [Qwen3-MoE](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | โ
| โ
| โ
| โ
| โ
|
|
61 |
+
| [Qwen2.5](https://huggingface.co/collections/AngelSlim/qwen2-25-quant-68652d6cbdf5c0d4b1c4499a) | โ
| โ
| โ
| โ
| โ
|
|
62 |
+
| [DeepSeek-R1-Distill-Qwen](https://huggingface.co/collections/AngelSlim/deepseek-r1-distill-quant-68652f16a9c206b030b05f7f) | โ
| โ
| โ
| โ
| โ
|
|
63 |
+
| [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | โ
| โ
| โ
| โ
| โ
|
|
64 |
+
|
65 |
+
### Speculative Decoding
|
66 |
+
The Eagle3 weights for the Qwen3-8B model are now available, with Eagle3 weights for other models in the Qwen3 series to be released soon.
|
67 |
+
|
68 |
+
| Model | Eagle3 |
|
69 |
+
| ----------| ----------------- |
|
70 |
+
| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3/tree/main) | โ
|
|
71 |
+
| Qwen3-14B | coming soon |
|
72 |
+
| Qwen3-32B | coming soon |
|
73 |
+
|
74 |
+
## ๐๏ธHow to Use
|
75 |
+
|
76 |
+
### Install AngelSlim
|
77 |
+
|
78 |
+
We recommend using `pip` to install the latest stable version of `AngelSlim`:
|
79 |
+
|
80 |
+
```shell
|
81 |
+
pip install angelslim
|
82 |
+
```
|
83 |
+
|
84 |
+
Alternatively, you can clone the repository and install from source in editable mode:
|
85 |
+
|
86 |
+
```shell
|
87 |
+
cd AngelSlim && python setup.py install
|
88 |
+
```
|
89 |
+
|
90 |
+
For more detailed installation instructions, please refer to the [Installation Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/installation.html).
|
91 |
+
|
92 |
+
### Quick Start
|
93 |
+
|
94 |
+
After installing `AngelSlim`, you can quickly start by running the following script to perform static `FP8` quantization on the `Qwen3-1.7B` model:
|
95 |
+
|
96 |
+
* One-click Start
|
97 |
+
|
98 |
+
```shell
|
99 |
+
python3 tools/run.py -c configs/qwen3/fp8_static/qwen3-1_7b_fp8_static.yaml
|
100 |
+
```
|
101 |
+
|
102 |
+
This example will load the HuggingFace model and perform activation value calibration using the `dataset` specified in the config file, saving the quantized model weights.
|
103 |
+
|
104 |
+
* Code-based Start
|
105 |
+
|
106 |
+
To perform dynamic `FP8` quantization on `Qwen3-1.7B`:
|
107 |
+
|
108 |
+
```python
|
109 |
+
from angelslim.engine import Engine
|
110 |
+
|
111 |
+
slim_engine = Engine()
|
112 |
+
# Prepare model
|
113 |
+
slim_engine.prepare_model(model_name="Qwen", model_path="Qwen/Qwen3-1.7B",)
|
114 |
+
# Initialize compressor
|
115 |
+
slim_engine.prepare_compressor("PTQ", default_method="fp8_dynamic")
|
116 |
+
# Compress model
|
117 |
+
slim_engine.run()
|
118 |
+
# Save compressed model
|
119 |
+
slim_engine.save("./output")
|
120 |
+
```
|
121 |
+
|
122 |
+
For more details, please refer to the [Quick Start Documentation](https://angelslim.readthedocs.io/zh-cn/latest/getting_started/quickstrat.html).
|
123 |
+
|
124 |
+
### ๐ฅ๏ธ Deployment and Testing
|
125 |
+
|
126 |
+
#### 1. API Service Deployment
|
127 |
+
|
128 |
+
After specifying the quantized model path `MODEL_PATH`, you can deploy an OpenAI-compatible API service using the following LLMs inference frameworks:
|
129 |
+
|
130 |
+
**vLLM**
|
131 |
+
|
132 |
+
Use the following script to launch a [vLLM](https://github.com/vllm-project/vllm) server, recommended version `vllm>=0.8.5.post1`. For MOE INT8 quantized models, vllm>=0.9.0 is required.
|
133 |
+
|
134 |
+
|
135 |
+
```shell
|
136 |
+
bash deploy/run_vllm.sh $MODEL_PATH
|
137 |
+
```
|
138 |
+
|
139 |
+
**SGLang**
|
140 |
+
|
141 |
+
|
142 |
+
Use the following script to launch a [SGLang](https://github.com/sgl-project/sglang) server, recommended version `sglang>=0.4.6.post1`.
|
143 |
+
|
144 |
+
```shell
|
145 |
+
bash deploy/run_sglang.sh $MODEL_PATH
|
146 |
+
```
|
147 |
+
|
148 |
+
#### 2. Service Invocation
|
149 |
+
|
150 |
+
Invoke requests via [OpenAI's API format](https://platform.openai.com/docs/api-reference/introduction):
|
151 |
+
|
152 |
+
```shell
|
153 |
+
bash deploy/openai.sh $MODEL_PATH
|
154 |
+
```
|
155 |
+
|
156 |
+
#### 3. Performance Evaluation
|
157 |
+
|
158 |
+
Evaluate the performance of quantized model using [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness), recommended version`lm-eval>=0.4.8`:
|
159 |
+
|
160 |
+
```shell
|
161 |
+
bash deploy/lm_eval.sh $MODEL_PATH
|
162 |
+
```
|
163 |
+
|
164 |
+
For more detaileds, please refer to the [Deployment Documentation](https://angelslim.readthedocs.io/zh-cn/latest/deployment/deploy.html).
|
165 |
+
|
166 |
+
|
167 |
+
## ๐ Benchmark
|
168 |
+
|
169 |
+
### Quantization
|
170 |
+
|
171 |
+
The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
|
172 |
+
|
173 |
+
#### Hunyuan Series Models
|
174 |
+
|
175 |
+
Benchmark results for the `Hunyuan-A13B-Instruct` model with `FP8` and `INT4-GPTQ` quantization algorithms on datasets including `AIME 2024`, `GSM8K`, `BBH`, and `DROP`:
|
176 |
+
|
177 |
+
| Bench | Hunyuan-A13B-Instruct | Hunyuan-A13B-Instruct-FP8 | Hunyuan-A13B-Instruct-Int4-GPTQ |
|
178 |
+
|:---------:|:---------------------:|:-------------------------:|:-------------------------------:|
|
179 |
+
| AIME 2024 | 87.3 | 86.7 | 86.7 |
|
180 |
+
| GSM8K | 94.39 | 94.01 | 94.24 |
|
181 |
+
| BBH | 89.1 | 88.34 | 87.91 |
|
182 |
+
| DROP | 91.1 | 91.1 | 91.05 |
|
183 |
+
|
184 |
+
#### Qwen3 Series Models
|
185 |
+
|
186 |
+
Benchmark results for Qwen3 series models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`, and `INT4-AWQ` quantization algorithms on datasets including `CEVAL`, `MMLU`, `GSM8K`, and `HUMANEVAL`:
|
187 |
+
|
188 |
+
<table>
|
189 |
+
<thead>
|
190 |
+
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th><th>HUMANEVAL</th></tr>
|
191 |
+
</thead>
|
192 |
+
<tbody>
|
193 |
+
<tr><td rowspan="4">Qwen3-0.6B</td><td>BF16</td><td>45.84</td><td>47.21</td><td>42.99</td><td>19.51</td></tr>
|
194 |
+
<tr><td>FP8-Static</td><td>45.99</td><td>46.87</td><td>38.06</td><td>18.90</td></tr>
|
195 |
+
<tr><td>FP8-Dynamic</td><td>45.99</td><td>46.93</td><td>38.29</td><td>20.73</td></tr>
|
196 |
+
<tr><td>INT8-Dynamic</td><td>45.17</td><td>46.95</td><td>41.17</td><td>21.34</td></tr>
|
197 |
+
<tr><td rowspan="6">Qwen3-8B</td><td>BF16</td><td>79.27</td><td>74.78</td><td>87.79</td><td>63.41</td></tr>
|
198 |
+
<tr><td>FP8-Static</td><td>78.23</td><td>74.79</td><td>86.96</td><td>62.20</td></tr>
|
199 |
+
<tr><td>FP8-Dynamic</td><td>78.45</td><td>74.75</td><td>87.64</td><td>62.80</td></tr>
|
200 |
+
<tr><td>INT8-Dynamic</td><td>78.01</td><td>74.84</td><td>86.96</td><td>67.07</td></tr>
|
201 |
+
<tr><td>INT4-GPTQ</td><td>77.19</td><td>73.26</td><td>86.43</td><td>62.20</td></tr>
|
202 |
+
<tr><td>INT4-AWQ</td><td>76.15</td><td>73.59</td><td>86.96</td><td>63.41</td></tr>
|
203 |
+
<tr><td rowspan="6">Qwen3-14B</td><td>BF16</td><td>83.06</td><td>78.90</td><td>88.40</td><td>55.49</td></tr>
|
204 |
+
<tr><td>FP8-Static</td><td>82.62</td><td>78.57</td><td>89.46</td><td>57.32</td></tr>
|
205 |
+
<tr><td>FP8-Dynamic</td><td>82.24</td><td>78.92</td><td>88.32</td><td>52.44</td></tr>
|
206 |
+
<tr><td>INT8-Dynamic</td><td>81.87</td><td>78.13</td><td>86.28</td><td>56.10</td></tr>
|
207 |
+
<tr><td>INT4-GPTQ</td><td>81.05</td><td>78.02</td><td>87.34</td><td>57.93</td></tr>
|
208 |
+
<tr><td>INT4-AWQ</td><td>82.02</td><td>77.68</td><td>84.23</td><td>61.59</td></tr>
|
209 |
+
<tr><td rowspan="5">Qwen3-32B</td><td>BF16</td><td>86.55</td><td>82.00</td><td>74.53</td><td>37.80</td></tr>
|
210 |
+
<tr><td>FP8-Static</td><td>86.92</td><td>81.78</td><td>70.20</td><td>39.63</td></tr>
|
211 |
+
<tr><td>FP8-Dynamic</td><td>86.55</td><td>81.89</td><td>70.43</td><td>38.41</td></tr>
|
212 |
+
<tr><td>INT4-GPTQ</td><td>86.18</td><td>81.01</td><td>-</td><td>43.29</td></tr>
|
213 |
+
<tr><td>INT4-AWQ</td><td>86.18</td><td>81.54</td><td>-</td><td>36.59</td></tr>
|
214 |
+
<tr><td rowspan="4">Qwen3-30B-A3B</td><td>BF16</td><td>83.66</td><td>79.36</td><td>89.99</td><td>31.71</td></tr>
|
215 |
+
<tr><td>FP8-Static</td><td>83.95</td><td>79.47</td><td>89.01</td><td>31.10</td></tr>
|
216 |
+
<tr><td>FP8-Dynamic</td><td>84.10</td><td>79.40</td><td>89.16</td><td>32.93</td></tr>
|
217 |
+
<tr><td>INT8-Dynamic</td><td>83.36</td><td>79.48</td><td>89.16</td><td>34.15</td></tr>
|
218 |
+
<tr><td rowspan="4">Qwen3-235B-A22B</td><td>BF16</td><td>89.60</td><td>86.28</td><td>85.29</td><td>27.44</td></tr>
|
219 |
+
<tr><td>FP8-Static</td><td>89.67</td><td>86.19</td><td>86.96</td><td>27.44</td></tr>
|
220 |
+
<tr><td>FP8-Dynamic</td><td>89.67</td><td>86.18</td><td>85.22</td><td>28.05</td></tr>
|
221 |
+
<tr><td>INT8-Dynamic</td><td>88.93</td><td>86.20</td><td>86.20</td><td>23.78</td></tr>
|
222 |
+
<tr><td rowspan="5">QwQ-32B</td><td>BF16</td><td>85.74</td><td>82.03</td><td>73.31</td><td>42.68</td></tr>
|
223 |
+
<tr><td>FP8-Static</td><td>85.44</td><td>81.91</td><td>75.36</td><td>42.68</td></tr>
|
224 |
+
<tr><td>FP8-Dynamic</td><td>85.07</td><td>81.93</td><td>75.66</td><td>42.07</td></tr>
|
225 |
+
<tr><td>INT4-GPTQ</td><td>84.03</td><td>81.26</td><td>68.23</td><td>45.73</td></tr>
|
226 |
+
<tr><td>INT4-AWQ</td><td>83.58</td><td>81.01</td><td>68.69</td><td>43.29</td></tr>
|
227 |
+
</tbody>
|
228 |
+
</table>
|
229 |
+
|
230 |
+
#### Other Models
|
231 |
+
|
232 |
+
Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`, and `INT4-AWQ` quantization algorithms on datasets including `CEVAL`, `MMLU` and `GSM8K`:
|
233 |
+
|
234 |
+
<table>
|
235 |
+
<thead>
|
236 |
+
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
|
237 |
+
</thead>
|
238 |
+
<tbody>
|
239 |
+
<tr><td rowspan="3">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
|
240 |
+
<tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
|
241 |
+
<tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
|
242 |
+
<tr><td rowspan="5">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
|
243 |
+
<tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
|
244 |
+
<tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
|
245 |
+
<tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
|
246 |
+
<tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
|
247 |
+
<tr><td rowspan="5">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
|
248 |
+
<tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
|
249 |
+
<tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
|
250 |
+
<tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
|
251 |
+
<tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
|
252 |
+
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
|
253 |
+
<tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
|
254 |
+
<tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
|
255 |
+
<tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
|
256 |
+
<tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
|
257 |
+
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
|
258 |
+
<tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
|
259 |
+
<tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
|
260 |
+
<tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
|
261 |
+
<tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
|
262 |
+
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
|
263 |
+
<tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
|
264 |
+
<tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
|
265 |
+
<tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
|
266 |
+
<tr><td>INT4-AWQ</td><td>82.84</td><td>80.15</td><td>87.19</td></tr>
|
267 |
+
</tbody>
|
268 |
+
</table>
|
269 |
+
|
270 |
+
### Speculative Decoding
|
271 |
+
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
|
272 |
+
|
273 |
+
#### Qwen3-8B
|
274 |
+
|
275 |
+
<table border="0">
|
276 |
+
<thead>
|
277 |
+
<tr><th rowspan="3">Temperature</th><th rowspan="3">Method</th><th colspan="8">Datasets</th></tr>
|
278 |
+
<tr><th colspan="2">MT-bench</th><th colspan="2">HumanEval</th><th colspan="2">GSM8K</th><th colspan="2">Alpaca</th></tr>
|
279 |
+
<tr><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th></tr>
|
280 |
+
</thead>
|
281 |
+
<tbody>
|
282 |
+
<tr><td>T=0</td><td>Eagle3</td><td>2.63x</td><td>3.65</td><td>2.76x</td><td>3.85</td><td>2.82x</td><td>3.90</td><td>2.62x</td><td>3.48</td></tr>
|
283 |
+
<tr><td>T=1</td><td>Eagle3</td><td>1.98x</td><td>2.75</td><td>2.25x</td><td>3.11</td><td>2.31x</td><td>3.15</td><td>2.10x</td><td>2.76</td></tr>
|
284 |
+
</tbody>
|
285 |
+
</table>
|
286 |
+
|
287 |
+
|
288 |
+
## ๐ Model License
|
289 |
+
|
290 |
+
The code for this project is open-sourced under the [License for AngelSlim](License_AngelSlim_model_and_dataset.txt).
|
291 |
+
|
292 |
+
## ๐ Citation
|
293 |
+
|
294 |
+
```
|
295 |
+
@software{AngelSlim2025,
|
296 |
+
title={{AngelSlim}},
|
297 |
+
author={Tencent AngelSlim Project Contributors},
|
298 |
+
year={2025},
|
299 |
+
month={6},
|
300 |
+
url={https://github.com/Tencent/AngelSlim},
|
301 |
+
}
|
302 |
+
```
|
303 |
+
|
304 |
+
## ๐ฌ Technical Discussion
|
305 |
+
|
306 |
+
* AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).
|