jeffra commited on
Commit
a1604d2
·
verified ·
1 Parent(s): 95d27eb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +32 -0
README.md ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+ # ArcticSpeculator
6
+
7
+ Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
8
+
9
+ We compare the throughput (tokens/s) of existing vllm-based speculative decoding systmes for Llama3.1-70B-Instruct on 8xH100 as below:
10
+
11
+ | method | ShareGPT | HumanEval |
12
+ |--------------------------------------|----------------|--------------|
13
+ | VLLM V1 Baseline | 84.1 | 84.1 |
14
+ | VLLM V1 Eagle | 102.2 | 112.0 |
15
+ | VLLM V1 Eagle3 | 77.7 | 85.3 |
16
+ | VLLM V0 MLP-Speculator (IBM) | 77.9 | 66.7 |
17
+ | ArcticSpeculator | **172.4** | **203.7** |
18
+
19
+ For more details about ArcticSpeculator and how to use it:
20
+
21
+ * ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
22
+ * 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)
23
+
24
+ We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) to run with [ArcticInference](https://github.com/snowflakedb/ArcticInference):
25
+
26
+ | model | ArcticSpeculator |
27
+ |---- | ---- |
28
+ | [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | |
29
+ | [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | |
30
+ | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | |
31
+ <!-- | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | |
32
+ | [openhands-lm-32b-v0.1-ep3](https://huggingface.co/all-hands/openhands-lm-32b-v0.1-ep3)| | -->