Transformers
Safetensors
Chinese
English
llama
qwen3
eagle3
text-generation-inference

Zhi-Create-Qwen3-32B-Eagle3

This is a speculator model designed for use with Zhihu-ai/Zhi-Create-Qwen3-32B, based on the EAGLE-3 speculative decoding algorithm. It was trained using the SpecForge library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B.
The model was trained in both thinking and non-thinking modes.

You can easily start a service using SGLang.

pip install "sglang[all]>=0.4.9"

python3 -m sglang.launch_server --model Zhihu-ai/Zhi-Create-Qwen3-32B --speculative-algorithm EAGLE3 --speculative-draft-model-path Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 8 --tp 2 --port 8000 --dtype bfloat16 --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B

# send request
curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Zhi-Create-Qwen3-32B",
        "prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
        "max_tokens": 4096,
        "temperature": 0.6,
        "top_p": 0.95
    }'
# Alternative: Using OpenAI API
from openai import OpenAI
openai_api_key = "empty"
openai_api_base = "http://127.0.0.1:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base
)

def get_answer(messages):
    response = client.chat.completions.create(
        messages=messages,
        model="Zhi-Create-Qwen3-32B",
        max_tokens=4096,
        temperature=0.3,
        top_p=0.95,
        stream=True,
        extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
    )
    answer = ""
    reasoning_content_all = ""
    for each in response:
        each_content = each.choices[0].delta.content
        if hasattr(each.choices[0].delta, "content"):
            each_content = each.choices[0].delta.content
        else:
            each_content = None
        if hasattr(each.choices[0].delta, "reasoning_content"):
            reasoning_content = each.choices[0].delta.reasoning_content
        else:
            reasoning_content = None
        if each_content is not None:
            answer += each_content
            print(each_content, end="", flush=True)
        if reasoning_content is not None:
            reasoning_content_all += reasoning_content
            print(reasoning_content, end="", flush=True)
    return answer, reasoning_content_all

prompt = "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章"
messages = [
    {"role": "user", "content": prompt}
]

answer, reasoning_content_all = get_answer(messages)
Downloads last month
7
Safetensors
Model size
1.56B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3

Base model

Qwen/Qwen3-32B
Finetuned
(1)
this model

Datasets used to train Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3