metadata
license: apache-2.0
datasets:
- mlabonne/FineTome-100k
- open-r1/Mixture-of-Thoughts
base_model:
- Qwen/Qwen3-1.7B
tags:
- think
- reasoning
- qwen3
Model details
This is a Qwen 3 1.7b model trained on 20k conversations from open-r1/Mixture-of-Thoughts
and 3k conversations from mlabonne/FineTome-100k
to enchance it's reasoning capabilities.
This model aims to run in weaker or old devices such as smartphones or an old laptop.
How to run
You can run this model by using multiple interface choices
transformers
As the qwen team suggested to use
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ertghiu256/qwen3-1.7b-mixture-of-thought"
# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype="auto",
device_map="auto"
)
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
# conduct text completion
generated_ids = model.generate(
**model_inputs,
max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
# parsing thinking content
try:
# rindex finding 151668 (</think>)
index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
index = 0
thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")
print("thinking content:", thinking_content)
print("content:", content)
vllm
Run this command
vllm serve ertghiu256/qwen3-1.7b-mixture-of-thought --enable-reasoning --reasoning-parser deepseek_r1
Sglang
Run this command
python -m sglang.launch_server --model-path ertghiu256/qwen3-1.7b-mixture-of-thought --reasoning-parser deepseek-r1
llama.cpp
Run this command
llama-server --hf-repo ertghiu256/qwen3-1.7b-mixture-of-thought
or
llama-cli --hf ertghiu256/qwen3-1.7b-mixture-of-thought
ollama
Run this command
ollama run hf.co/ertghiu256/qwen3-1.7b-mixture-of-thought:Q4_K_M
lm studio
Search
ertghiu256/qwen3-1.7b-mixture-of-thought
in the lm studio model search list then download
Recomended parameters
Extended thinking mode
temp: 0.6
num_ctx: ≥8192
top_p: 0.95
top_k: 10
Short thinking mode
temp: 0.5
num_ctx: ≥4096
top_p: 0.8
top_k: 10
min_p: 0.1
Training details
Lora rank: 32
Learning rate: 1e-4
Steps: 70
Datasets:
- FlameF0X/Mixture-of-Thoughts-2048T
- mlabonne/FineTome-100k