Model Overview

Kimi-K2-Instruct-eagle3 is a specialized draft model designed to accelerate the inference of the Kimi-K2-Instruct ecosystem using the EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency) framework.

Kimi-K2-Instruct with EAGLE3 achieves up to 1.8脳 peak throughput versus the base model, accelerating generation across all 7 benchmarks鈥攆rom +24% on MT-Bench to +80% on Math500 (configured with bs=8, steps=3, topk=1, num_draft_tokens=4).

Built upon the Llama architecture, this model acts as a highly efficient drafter. It has been trained on 1.4 million high-quality samples from the Open-PerfectBlend dataset, ensuring strict alignment with the teacher model's distribution.

This model serves as a general-purpose English instruction follower with strong capabilities in:

  • Conversation
  • Mathematical Reasoning
  • Code Generation

Efficient Download Guide

To minimize download time and storage usage, please note the function of the files in the repository:

For Inference: You only need to download config.json and model.safetensors.

For Continued Training: The file training_state.pt contains optimizer states specifically for resuming training. If you only intend to use the model for inference, you can skip downloading this file.

Performance & Acceleration

The core value of this EAGLE model is its ability to predict multiple future tokens that are subsequently verified by the base model. High acceptance lengths indicate significant latency reduction.

Average Token Acceptance Lengths (MLA):

Benchmark Average Acceptance Length
HumanEval (Code) 3.372
GSM8K (Math) 3.165
Math500 (Complex Math) 3.490

These metrics demonstrate robust acceleration performance across diverse and complex domains.

1 2 3

Quick Start

Requirements

  • NVIDIA GPU
  • CUDA 12.0+
  • PyTorch 2.0+

Installation

pip install sglang==0.5.6

Inference with SGLang

python3 -m sglang.launch_server  \
    --model-path /models/Kimi-K2-Instruct \
    --host 0.0.0.0 --port 30012  \
    --trust-remote-code  \
    --attention-backend flashinfer  \
    --mem-fraction-static 0.9 \
    --tp-size 8  \
    --speculative-algorithm EAGLE3  \
    --speculative-draft-model-path AQ-MedAI/Kimi-K2-Instruct-eagle3 \
    --speculative-num-steps 3  \
    --speculative-eagle-topk 1   \
    --speculative-num-draft-tokens 4

Training Data

The model was trained on 1.4 million samples sourced from the Open-PerfectBlend dataset. The data selection prioritizes high-quality instruction-following scenarios to maximize the draft model's predictive accuracy relative to the base model.

Citation

If you use this model in your research or application, please cite the following:

@misc{kimik2eagle3,
  title={Kimi-K2-Instruct-eagle3: Accelerating Instruction Following with EAGLE},
  author={Ant AQ Team},
  year={2025},
}
Downloads last month
79
Safetensors
Model size
1B params
Tensor type
I64
BF16
BOOL
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support