Eval

+---------+------------+-----------------+---------------+-------+---------+---------+
| Model   | Dataset    | Metric          | Subset        |   Num |   Score | Cat.0   |
+=========+============+=================+===============+=======+=========+=========+
| model   | gpqa       | AveragePass@1   | gpqa_extended |    50 |  0.26   | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | gpqa       | AveragePass@1   | gpqa_main     |    50 |  0.36   | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | gpqa       | AveragePass@1   | gpqa_diamond  |    50 |  0.38   | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | gpqa       | AveragePass@1   | OVERALL       |   150 |  0.3333 | -       |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | gsm8k      | AverageAccuracy | main          |    50 |  0.68   | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | tool_bench | Act.EM          | in_domain     |    41 |  0.3171 | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | tool_bench | Act.EM          | out_of_domain |    47 |  0.3191 | default |
+---------+------------+-----------------+---------------+-------+---------+---------+
| model   | tool_bench | Act.EM          | OVERALL       |    88 |  0.3182 | -       |
+---------+------------+-----------------+---------------+-------+---------+---------+
+---------+------------+-----------+---------------+-------+---------+---------+
| Model   | Dataset    | Metric    | Subset        |   Num |   Score | Cat.0   |
+=========+============+===========+===============+=======+=========+=========+
| qwen    | tool_bench | Act.EM    | in_domain     |    42 |  0.1667 | default |
+---------+------------+-----------+---------------+-------+---------+---------+
| qwen    | tool_bench | Act.EM    | out_of_domain |    48 |  0.1667 | default |
+---------+------------+-----------+---------------+-------+---------+---------+
| qwen    | tool_bench | Act.EM    | OVERALL       |    90 |  0.1667 | -       |
+---------+------------+-----------+---------------+-------+---------+---------+
Downloads last month
36
Safetensors
Model size
4.02B params
Tensor type
BF16
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wesjos/Qwen3-4B-toolcall

Unable to build the model tree, the base model loops to the model itself. Learn more.