Qwen3-14B-speculator.eagle3

Model Overview

  • Verifier: Qwen/Qwen3-14B
  • Speculative Decoding Algorithm: EAGLE-3
  • Model Architecture: Eagle3Speculator
  • Release Date: 09/18/2025
  • Version: 1.0
  • Model Developers: RedHat

This is a speculator model designed for use with Qwen/Qwen3-14B, based on the EAGLE-3 speculative decoding algorithm. It was trained using the speculators library on a combination of the Aeala/ShareGPT_Vicuna_unfiltered and the train_sft split of HuggingFaceH4/ultrachat_200k datasets. This model should be used with the Qwen/Qwen3-14B chat template, specifically through the /chat/completions endpoint.

Use with vLLM

vllm serve Qwen/Qwen3-14B \
  -tp 2 \
  --speculative-config '{
    "model": "RedHatAI/Qwen3-14B-speculator.eagle3",
    "num_speculative_tokens": 3,
    "method": "eagle3"
  }'

Evaluations

Subset of GSM8k (math reasoning):

  • acceptance_rates = [71.7, 47.8, 29.1]
  • conditional_acceptance_rates = [71.7, 66.6, 60.9]

Subset of MTBench:

  • acceptance_rates = [64.3, 38.2, 21.0]
  • conditional_acceptance_rates = [64.3, 59.4, 54.9]
Downloads last month
82
Safetensors
Model size
1.39B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including RedHatAI/Qwen3-14B-speculator.eagle3