Model Overview
Kimi-K2-Instruct-eagle3 is a specialized draft model designed to accelerate the inference of the Kimi-K2-Instruct ecosystem using the EAGLE3 (Extrapolation Algorithm for Greater Language-model Efficiency) framework.
Kimi-K2-Instruct with EAGLE3 achieves up to 1.8脳 peak throughput versus the base model, accelerating generation across all 7 benchmarks鈥攆rom +24% on MT-Bench to +80% on Math500 (configured with bs=8, steps=3, topk=1, num_draft_tokens=4).
Built upon the Llama architecture, this model acts as a highly efficient drafter. It has been trained on 1.4 million high-quality samples from the Open-PerfectBlend dataset, ensuring strict alignment with the teacher model's distribution.
This model serves as a general-purpose English instruction follower with strong capabilities in:
- Conversation
- Mathematical Reasoning
- Code Generation
Efficient Download Guide
To minimize download time and storage usage, please note the function of the files in the repository:
For Inference: You only need to download config.json and model.safetensors.
For Continued Training: The file training_state.pt contains optimizer states specifically for resuming training. If you only intend to use the model for inference, you can skip downloading this file.
Performance & Acceleration
The core value of this EAGLE model is its ability to predict multiple future tokens that are subsequently verified by the base model. High acceptance lengths indicate significant latency reduction.
Average Token Acceptance Lengths (MLA):
| Benchmark | Average Acceptance Length |
|---|---|
| HumanEval (Code) | 3.372 |
| GSM8K (Math) | 3.165 |
| Math500 (Complex Math) | 3.490 |
These metrics demonstrate robust acceleration performance across diverse and complex domains.
Quick Start
Requirements
- NVIDIA GPU
- CUDA 12.0+
- PyTorch 2.0+
Installation
pip install sglang==0.5.6
Inference with SGLang
python3 -m sglang.launch_server \
--model-path /models/Kimi-K2-Instruct \
--host 0.0.0.0 --port 30012 \
--trust-remote-code \
--attention-backend flashinfer \
--mem-fraction-static 0.9 \
--tp-size 8 \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path AQ-MedAI/Kimi-K2-Instruct-eagle3 \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4
Training Data
The model was trained on 1.4 million samples sourced from the Open-PerfectBlend dataset. The data selection prioritizes high-quality instruction-following scenarios to maximize the draft model's predictive accuracy relative to the base model.
Citation
If you use this model in your research or application, please cite the following:
@misc{kimik2eagle3,
title={Kimi-K2-Instruct-eagle3: Accelerating Instruction Following with EAGLE},
author={Ant AQ Team},
year={2025},
}
- Downloads last month
- 79


