ArcticSpeculator

Build the fastest OSS vllm-based speculative decoding system for your own model, using ArcticTraining and ArcticInference!

Throughput (tokens/s) of gpt-oss-120b on 8xH100 using vLLM below:

method	ShareGPT	HumanEval
vLLM V1 Baseline	220.2	220.7
ArcticSpeculator	377.3	400.0

For more details about ArcticSpeculator and how to use it:

See all of the speculators we have released via our Speculators Collection

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Snowflake/Arctic-LSTM-Speculator-gpt-oss-120b