liusong123 commited on
Commit
d156528
Β·
verified Β·
1 Parent(s): c8c8b1f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -17
README.md CHANGED
@@ -1,6 +1,8 @@
1
  ---
2
  tags:
3
  - qwen3
 
 
4
  ---
5
 
6
  <p align="center">
@@ -67,13 +69,16 @@ Currently supports the following LLMs, including Hunyuan-Dense, Hunyuan-MoE, Qwe
67
  | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
68
 
69
  ### Speculative Decoding
70
- The Eagle3 weights for the Qwen3-8B model are now available, with Eagle3 weights for other models in the Qwen3 series to be released soon.
71
 
72
  | Model | Eagle3 |
73
  | ----------| ----------------- |
74
- | [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3/tree/main) | βœ… |
75
- | Qwen3-14B | coming soon |
76
- | Qwen3-32B | coming soon |
 
 
 
77
 
78
  ## πŸ›ŽοΈHow to Use
79
 
@@ -170,7 +175,7 @@ For more detaileds, please refer to the [Deployment Documentation](https://angel
170
 
171
  ## πŸ“ˆ Benchmark
172
 
173
- ### Quantization
174
 
175
  The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
176
 
@@ -271,27 +276,43 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
271
  </tbody>
272
  </table>
273
 
274
- ### Speculative Decoding
275
  Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
276
 
277
- #### Qwen3-8B
278
-
279
- <table border="0">
280
  <thead>
281
- <tr><th rowspan="3">Temperature</th><th rowspan="3">Method</th><th colspan="8">Datasets</th></tr>
282
- <tr><th colspan="2">MT-bench</th><th colspan="2">HumanEval</th><th colspan="2">GSM8K</th><th colspan="2">Alpaca</th></tr>
283
- <tr><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th><th>Speedup</th><th>Accept length</th></tr>
 
 
 
 
 
284
  </thead>
285
  <tbody>
286
- <tr><td>T=0</td><td>Eagle3</td><td>2.63x</td><td>3.65</td><td>2.76x</td><td>3.85</td><td>2.82x</td><td>3.90</td><td>2.62x</td><td>3.48</td></tr>
287
- <tr><td>T=1</td><td>Eagle3</td><td>1.98x</td><td>2.75</td><td>2.25x</td><td>3.11</td><td>2.31x</td><td>3.15</td><td>2.10x</td><td>2.76</td></tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
288
  </tbody>
289
  </table>
290
 
 
291
 
292
- ## πŸ“ Model License
293
-
294
- The code for this project is open-sourced under the [License for AngelSlim](License_AngelSlim_model_and_dataset.txt).
295
 
296
  ## πŸ”— Citation
297
 
 
1
  ---
2
  tags:
3
  - qwen3
4
+ - eagle3
5
+ - eagle
6
  ---
7
 
8
  <p align="center">
 
69
  | [QwQ](https://huggingface.co/collections/AngelSlim/qwen3-quant-68652e26da31740739d154f8) | βœ… | βœ… | βœ… | βœ… | βœ… |
70
 
71
  ### Speculative Decoding
72
+ The Eagle3 weights for the Qwen3 series model are now available.
73
 
74
  | Model | Eagle3 |
75
  | ----------| ----------------- |
76
+ | [Qwen3-1.7B](https://huggingface.co/AngelSlim/Qwen3-1.7B_eagle3) | βœ… |
77
+ | [Qwen3-4B](https://huggingface.co/AngelSlim/Qwen3-4B_eagle3) | βœ… |
78
+ | [Qwen3-8B](https://huggingface.co/AngelSlim/Qwen3-8B_eagle3) | βœ… |
79
+ | [Qwen3-14B](https://huggingface.co/AngelSlim/Qwen3-14B_eagle3) | βœ… |
80
+ | [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) | βœ… |
81
+ | [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) | βœ… |
82
 
83
  ## πŸ›ŽοΈHow to Use
84
 
 
175
 
176
  ## πŸ“ˆ Benchmark
177
 
178
+ ### (1) Quantization
179
 
180
  The performance test results for selected models are shown below. For the complete benchmark, refer to the [Benchmark documentation](https://angelslim.readthedocs.io/zh-cn/latest/performance/quantization/benchmarks.html)
181
 
 
276
  </tbody>
277
  </table>
278
 
279
+ ### (2) Speculative Decoding
280
  Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
281
 
282
+ <table>
 
 
283
  <thead>
284
+ <tr>
285
+ <th>&nbsp</th><th>&nbsp</th>
286
+ <th colspan="2" style="text-align: center; vertical-align: middle;">MT-bench</th>
287
+ <th colspan="2" style="text-align: center; vertical-align: middle;">HumanEval</th>
288
+ <th colspan="2" style="text-align: center; vertical-align: middle;">GSM8K</th>
289
+ <th colspan="2" style="text-align: center; vertical-align: middle;">Alpaca</th>
290
+ <th colspan="2" style="text-align: center; vertical-align: middle;">Mean</th></tr>
291
+ <tr><th>Temperature</th><th>Model</th><th>Speedup</th><th>Ο„</th><th>Speedup</th><th>Ο„</th><th>Speedup</th><th>Ο„</th><th>Speedup</th><th>Ο„</th><th>Speedup</th><th>Ο„</th></tr>
292
  </thead>
293
  <tbody>
294
+ <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
295
+ <tr><td rowspan="6"><strong>T=0</strong></td>
296
+ <td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
297
+ <tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
298
+ <tr><td>Qwen3-8B</td><td>2.65x</td><td>3.87</td><td>2.64x</td><td>3.82</td><td>2.86x</td><td>4.10</td><td>2.58x</td><td>3.55</td><td>2.68x</td><td>3.83</td></tr>
299
+ <tr><td>Qwen3-14B</td><td>2.23x</td><td>3.30</td><td>2.53x</td><td>3.74</td><td>2.56x</td><td>3.79</td><td>2.16x</td><td>3.13</td><td>2.37x</td><td>3.49</td></tr>
300
+ <tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
301
+ <tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
302
+ <!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
303
+ <tr><td rowspan="6"><strong>T=1</strong></td>
304
+ <td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
305
+ <tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
306
+ <tr><td>Qwen3-8B</td><td>1.91x</td><td>2.84</td><td>2.07x</td><td>3.05</td><td>2.34x</td><td>3.26</td><td>2.09x</td><td>2.92</td><td>2.10x</td><td>3.02</td></tr>
307
+ <tr><td>Qwen3-14B</td><td>1.71x</td><td>2.61</td><td>1.95x</td><td>2.87</td><td>2.04x</td><td>3.08</td><td>1.68x</td><td>2.55</td><td>1.84x</td><td>2.78</td></tr>
308
+ <tr><td>Qwen3-32B</td><td>1.62x</td><td>1.91</td><td>1.71x</td><td>2.05</td><td>1.78x</td><td>2.10</td><td>1.80x</td><td>1.95</td><td>1.62x</td><td>2.00</td></tr>
309
+ <tr><td>Qwen3-30B-A3B</td><td>1.91x</td><td>2.46</td><td>2.00x</td><td>2.64</td><td>1.90x</td><td>2.53</td><td>1.80x</td><td>2.32</td><td>1.90x</td><td>2.48</td></tr>
310
  </tbody>
311
  </table>
312
 
313
+ ## πŸ“ License
314
 
315
+ The code for this project is open-sourced under the [License for AngelSlim](LICENSE).
 
 
316
 
317
  ## πŸ”— Citation
318