Update README.md
Browse files
README.md
CHANGED
@@ -1,6 +1,8 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
- qwen3
|
|
|
|
|
4 |
---
|
5 |
|
6 |
<p align="center">
|
@@ -67,13 +69,16 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
|
|
67 |
| [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | β
| β
| β
| β
| β
|
|
68 |
|
69 |
### Speculative Decoding
|
70 |
-
The Eagle3 weights for the Qwen3
|
71 |
|
72 |
| Model | Eagle3 |
|
73 |
| ----------| ----------------- |
|
74 |
-
| [Qwen3-
|
75 |
-
| Qwen3-
|
76 |
-
| Qwen3-
|
|
|
|
|
|
|
77 |
|
78 |
## ποΈHow to Use
|
79 |
|
@@ -170,7 +175,7 @@ For more detaileds, please refer to the [Deployment Documentation](https://angel
|
|
170 |
|
171 |
## π Benchmark
|
172 |
|
173 |
-
### Quantization
|
174 |
|
175 |
The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
|
176 |
|
@@ -271,27 +276,43 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
|
|
271 |
</tbody>
|
272 |
</table>
|
273 |
|
274 |
-
### Speculative Decoding
|
275 |
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
|
276 |
|
277 |
-
|
278 |
-
|
279 |
-
<table border="0">
|
280 |
<thead>
|
281 |
-
<tr
|
282 |
-
|
283 |
-
|
|
|
|
|
|
|
|
|
|
|
284 |
</thead>
|
285 |
<tbody>
|
286 |
-
<tr><td>
|
287 |
-
<tr><td>T=
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
288 |
</tbody>
|
289 |
</table>
|
290 |
|
|
|
291 |
|
292 |
-
|
293 |
-
|
294 |
-
The code for this project is open-sourced under the [License for AngelSlim](License_AngelSlim_model_and_dataset.txt).
|
295 |
|
296 |
## π Citation
|
297 |
|
|
|
1 |
---
|
2 |
tags:
|
3 |
- qwen3
|
4 |
+
- eagle3
|
5 |
+
- eagle
|
6 |
---
|
7 |
|
8 |
<p align="center">
|
|
|
69 |
| [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | β
| β
| β
| β
| β
|
|
70 |
|
71 |
### Speculative Decoding
|
72 |
+
The Eagle3 weights for the Qwen3 series model are now available.
|
73 |
|
74 |
| Model | Eagle3 |
|
75 |
| ----------| ----------------- |
|
76 |
+
| [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) | β
|
|
77 |
+
| [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) | β
|
|
78 |
+
| [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) | β
|
|
79 |
+
| [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) | β
|
|
80 |
+
| [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) | β
|
|
81 |
+
| [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) | β
|
|
82 |
|
83 |
## ποΈHow to Use
|
84 |
|
|
|
175 |
|
176 |
## π Benchmark
|
177 |
|
178 |
+
### (1) Quantization
|
179 |
|
180 |
The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
|
181 |
|
|
|
276 |
</tbody>
|
277 |
</table>
|
278 |
|
279 |
+
### (2) Speculative Decoding
|
280 |
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
|
281 |
|
282 |
+
<table>
|
|
|
|
|
283 |
<thead>
|
284 |
+
<tr>
|
285 |
+
<th> </th><th> </th>
|
286 |
+
<th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
|
287 |
+
<th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
|
288 |
+
<th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
|
289 |
+
<th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
|
290 |
+
<th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
|
291 |
+
<tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>Ο</th><th>Speedup</th><th>Ο</th><th>Speedup</th><th>Ο</th><th>Speedup</th><th>Ο</th><th>Speedup</th><th>Ο</th></tr>
|
292 |
</thead>
|
293 |
<tbody>
|
294 |
+
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
|
295 |
+
<tr><td rowspan="6"><strong>T=0</strong></td>
|
296 |
+
<td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
|
297 |
+
<tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
|
298 |
+
<tr><td>Qwen3-8B</td><td>2.65x</td><td>3.87</td><td>2.64x</td><td>3.82</td><td>2.86x</td><td>4.10</td><td>2.58x</td><td>3.55</td><td>2.68x</td><td>3.83</td></tr>
|
299 |
+
<tr><td>Qwen3-14B</td><td>2.23x</td><td>3.30</td><td>2.53x</td><td>3.74</td><td>2.56x</td><td>3.79</td><td>2.16x</td><td>3.13</td><td>2.37x</td><td>3.49</td></tr>
|
300 |
+
<tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
|
301 |
+
<tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
|
302 |
+
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
|
303 |
+
<tr><td rowspan="6"><strong>T=1</strong></td>
|
304 |
+
<td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
|
305 |
+
<tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
|
306 |
+
<tr><td>Qwen3-8B</td><td>1.91x</td><td>2.84</td><td>2.07x</td><td>3.05</td><td>2.34x</td><td>3.26</td><td>2.09x</td><td>2.92</td><td>2.10x</td><td>3.02</td></tr>
|
307 |
+
<tr><td>Qwen3-14B</td><td>1.71x</td><td>2.61</td><td>1.95x</td><td>2.87</td><td>2.04x</td><td>3.08</td><td>1.68x</td><td>2.55</td><td>1.84x</td><td>2.78</td></tr>
|
308 |
+
<tr><td>Qwen3-32B</td><td>1.62x</td><td>1.91</td><td>1.71x</td><td>2.05</td><td>1.78x</td><td>2.10</td><td>1.80x</td><td>1.95</td><td>1.62x</td><td>2.00</td></tr>
|
309 |
+
<tr><td>Qwen3-30B-A3B</td><td>1.91x</td><td>2.46</td><td>2.00x</td><td>2.64</td><td>1.90x</td><td>2.53</td><td>1.80x</td><td>2.32</td><td>1.90x</td><td>2.48</td></tr>
|
310 |
</tbody>
|
311 |
</table>
|
312 |
|
313 |
+
## π License
|
314 |
|
315 |
+
The code for this project is open-sourced under the [License for AngelSlim](LICENSE).
|
|
|
|
|
316 |
|
317 |
## π Citation
|
318 |
|