Speculators
Collection
6 items
โข
Updated
โข
2
Build the fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!
Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:
method | ShareGPT | HumanEval |
---|---|---|
vLLM V1 Baseline | 220.2 | 220.7 |
ArcticSpeculator | 377.3 | 400.0 |
For more details about ArcticSpeculator and how to use it:
See all of the speculators we have released via our Speculators Collection