aurick's picture
Update README.md (#1)
c72b26f verified
metadata
license: apache-2.0

ArcticSpeculator

Build the fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!

Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:

method ShareGPT HumanEval
vLLM V1 Baseline 220.2 220.7
ArcticSpeculator 377.3 400.0

For more details about ArcticSpeculator and how to use it:

See all of the speculators we have released via our Speculators Collection