Zhi-Create-Qwen3-32B-Eagle3
This is a speculator model designed for use with Zhihu-ai/Zhi-Create-Qwen3-32B, based on the EAGLE-3 speculative decoding algorithm.
It was trained using the SpecForge library on a subset of the Supervised Fine-tuning (SFT) Data from Zhihu-ai/Zhi-Create-Qwen3-32B.
The model was trained in both thinking and non-thinking modes.
You can easily start a service using SGLang.
pip install "sglang[all]>=0.4.9"
python3 -m sglang.launch_server --model Zhihu-ai/Zhi-Create-Qwen3-32B --speculative-algorithm EAGLE3 --speculative-draft-model-path Zhihu-ai/Zhi-Create-Qwen3-32B-Eagle3 --speculative-num-steps 3 --speculative-eagle-topk 2 --speculative-num-draft-tokens 8 --tp 2 --port 8000 --dtype bfloat16 --reasoning-parser deepseek-r1 --served-model-name Zhi-Create-Qwen3-32B
# send request
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Zhi-Create-Qwen3-32B",
"prompt": "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章",
"max_tokens": 4096,
"temperature": 0.6,
"top_p": 0.95
}'
# Alternative: Using OpenAI API
from openai import OpenAI
openai_api_key = "empty"
openai_api_base = "http://127.0.0.1:8000/v1"
client = OpenAI(
api_key=openai_api_key,
base_url=openai_api_base
)
def get_answer(messages):
response = client.chat.completions.create(
messages=messages,
model="Zhi-Create-Qwen3-32B",
max_tokens=4096,
temperature=0.3,
top_p=0.95,
stream=True,
extra_body = {"chat_template_kwargs": {"enable_thinking": True}}
)
answer = ""
reasoning_content_all = ""
for each in response:
each_content = each.choices[0].delta.content
if hasattr(each.choices[0].delta, "content"):
each_content = each.choices[0].delta.content
else:
each_content = None
if hasattr(each.choices[0].delta, "reasoning_content"):
reasoning_content = each.choices[0].delta.reasoning_content
else:
reasoning_content = None
if each_content is not None:
answer += each_content
print(each_content, end="", flush=True)
if reasoning_content is not None:
reasoning_content_all += reasoning_content
print(reasoning_content, end="", flush=True)
return answer, reasoning_content_all
prompt = "请你以鲁迅的口吻,写一篇介绍西湖醋鱼的文章"
messages = [
{"role": "user", "content": prompt}
]
answer, reasoning_content_all = get_answer(messages)
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support