Eval
+---------+------------+-----------------+---------------+-------+---------+---------+ | Model | Dataset | Metric | Subset | Num | Score | Cat.0 | +=========+============+=================+===============+=======+=========+=========+ | model | gpqa | AveragePass@1 | gpqa_extended | 50 | 0.26 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_main | 50 | 0.36 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_diamond | 50 | 0.38 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | OVERALL | 150 | 0.3333 | - | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gsm8k | AverageAccuracy | main | 50 | 0.68 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | in_domain | 41 | 0.3171 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | out_of_domain | 47 | 0.3191 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | OVERALL | 88 | 0.3182 | - | +---------+------------+-----------------+---------------+-------+---------+---------+
+---------+------------+-----------+---------------+-------+---------+---------+ | Model | Dataset | Metric | Subset | Num | Score | Cat.0 | +=========+============+===========+===============+=======+=========+=========+ | qwen | tool_bench | Act.EM | in_domain | 42 | 0.1667 | default | +---------+------------+-----------+---------------+-------+---------+---------+ | qwen | tool_bench | Act.EM | out_of_domain | 48 | 0.1667 | default | +---------+------------+-----------+---------------+-------+---------+---------+ | qwen | tool_bench | Act.EM | OVERALL | 90 | 0.1667 | - | +---------+------------+-----------+---------------+-------+---------+---------+
- Downloads last month
- 36
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for wesjos/Qwen3-4B-toolcall
Unable to build the model tree, the base model loops to the model itself. Learn more.