Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-4.0
|
3 |
+
---
|
4 |
+
|
5 |
+
# ArcticSpeculator
|
6 |
+
|
7 |
+
Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
|
8 |
+
|
9 |
+
We compare the throughput (tokens/s) of existing vllm-based speculative decoding systmes for Llama3.1-70B-Instruct on 8xH100 as below:
|
10 |
+
|
11 |
+
| method | ShareGPT | HumanEval |
|
12 |
+
|--------------------------------------|----------------|--------------|
|
13 |
+
| VLLM V1 Baseline | 84.1 | 84.1 |
|
14 |
+
| VLLM V1 Eagle | 102.2 | 112.0 |
|
15 |
+
| VLLM V1 Eagle3 | 77.7 | 85.3 |
|
16 |
+
| VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
|
17 |
+
| ArcticSpeculator | **172.4** | **203.7** |
|
18 |
+
|
19 |
+
For more details about ArcticSpeculator and how to use it:
|
20 |
+
|
21 |
+
* ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
|
22 |
+
* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)
|
23 |
+
|
24 |
+
We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) to run with [ArcticInference](https://github.com/snowflakedb/ArcticInference):
|
25 |
+
|
26 |
+
| model | ArcticSpeculator |
|
27 |
+
|---- | ---- |
|
28 |
+
| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | |
|
29 |
+
| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | |
|
30 |
+
| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | |
|
31 |
+
<!-- | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | |
|
32 |
+
| [openhands-lm-32b-v0.1-ep3](https://huggingface.co/all-hands/openhands-lm-32b-v0.1-ep3)| | -->
|