Eval

+-------------+------------+-----------------+---------------+-------+---------+---------+
| Model       | Dataset    | Metric          | Subset        |   Num |   Score | Cat.0   |
+=============+============+=================+===============+=======+=========+=========+
| model       | gpqa       | AveragePass@1   | gpqa_extended |    50 |  0.34   | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | gpqa       | AveragePass@1   | gpqa_main     |    50 |  0.32   | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | gpqa       | AveragePass@1   | gpqa_diamond  |    50 |  0.32   | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | gpqa       | AveragePass@1   | OVERALL       |   150 |  0.3267 | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | gsm8k      | AverageAccuracy | main          |    50 |  0.76   | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Act.EM          | in_domain     |    42 |  0.2619 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Act.EM          | out_of_domain |    47 |  0.3617 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Act.EM          | OVERALL       |    89 |  0.3146 | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Plan.EM         | in_domain     |     0 |  0      | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Plan.EM         | out_of_domain |     0 |  0      | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Plan.EM         | OVERALL       |     0 |  0      | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | F1              | in_domain     |    42 |  0.2095 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | F1              | out_of_domain |    47 |  0.2527 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | F1              | OVERALL       |    89 |  0.2323 | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | HalluRate       | in_domain     |    42 |  0.119  | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | HalluRate       | out_of_domain |    47 |  0.0851 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | HalluRate       | OVERALL       |    89 |  0.1011 | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Rouge-L         | in_domain     |    42 |  0.0394 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Rouge-L         | out_of_domain |    47 |  0.0676 | default |
+-------------+------------+-----------------+---------------+-------+---------+---------+
| model       | tool_bench | Rouge-L         | OVERALL       |    89 |  0.0543 | -       |
+-------------+------------+-----------------+---------------+-------+---------+---------+

Use this model

with llama-cli

llama-cli -m Qwen3-4B-toolcall.Q4_K_M.gguf

with ollama

edit a makefile named(Qwen3-4B-toolcall.Q4_K_M.txt) like:

FROM ./Qwen3-4B-toolcall.Q4_K_M
TEMPLATE """<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""

then create a model using ollama
ollama create Qwen3-4B-toolcall.Q4_K_M -f Qwen3-4B-toolcall.Q4_K_M.txt
then run it
ollama run Qwen3-4B-toolcall.Q4_K_M

wesjos
/

Qwen3-4B-toolcall-GGUF

Eval

Use this model

with llama-cli

with ollama

Model tree for wesjos/Qwen3-4B-toolcall-GGUF