Snowflake
/

Arctic-LSTM-Speculator-Llama-3.1-70B-Instruct

Model card Files Files and versions

jeffra commited on Apr 30

Commit

a1604d2

·

verified ·

1 Parent(s): 95d27eb

Create README.md

Files changed (1) hide show

README.md +32 -0

README.md ADDED Viewed

	@@ -0,0 +1,32 @@

+---
+license: cc-by-nc-4.0
+---
+# ArcticSpeculator
+Build a fastest OSS vllm-based speculative decoding system for your own model, using [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) and [ArcticInference](https://github.com/snowflakedb/ArcticInference)!
+We compare the throughput (tokens/s) of existing vllm-based speculative decoding systmes for Llama3.1-70B-Instruct on 8xH100 as below:
+| method                                 | ShareGPT      | HumanEval |
+|--------------------------------------|----------------|--------------|
+| VLLM V1 Baseline      | 84.1 | 84.1    |
+| VLLM V1 Eagle | 102.2   | 112.0    |
+| VLLM V1 Eagle3  | 77.7   | 85.3 |
+| VLLM V0 MLP-Speculator (IBM) | 77.9   | 66.7        |
+| ArcticSpeculator                          | **172.4**   | **203.7**    |
+For more details about ArcticSpeculator and how to use it:
+* ❄️ [Using Arctic-Inference and Arctic-Training for improving real-world speculative decoding Performance (blog)]()
+* 🚀 [Getting started guide using ArcticTraining](https://github.com/snowflakedb/ArcticTraining/tree/mlp-variant-speculator/projects/mlp_variant_speculator)
+We also release ArcticSpeculator checkpoints we trained with [ArcticTraining](https://github.com/snowflakedb/ArcticTraining) to run with [ArcticInference](https://github.com/snowflakedb/ArcticInference):
+| model | ArcticSpeculator |
+|---- | ---- |
+| [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | |
+| [Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | |
+| [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct) | |
+<!-- | [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | |
+| [openhands-lm-32b-v0.1-ep3](https://huggingface.co/all-hands/openhands-lm-32b-v0.1-ep3)| | -->