Model information
- Finetine on Model:unsloth/Qwen3-4B-unsloth-bnb-4bit
- Dataset: interstellarninja/tool-calls-single-reasoning, mlabonne/FineTome-100k, unsloth/OpenMathReasoning-mini
Eval
- some improvements are made after finetuning
- gpqa: 0.24 --> 0.333
- gsm8k: 0.5 --> 0.68
- tool_bench: 0.1667 --> 0.3182
Before finetuning(unsloth/Qwen3-4B-unsloth-bnb-4bit)
+---------+------------+-----------------+---------------+-------+---------+---------+ | Model | Dataset | Metric | Subset | Num | Score | Cat.0 | +=========+============+=================+===============+=======+=========+=========+ | qwen | gpqa | AveragePass@1 | gpqa_extended | 50 | 0.24 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | gpqa | AveragePass@1 | gpqa_main | 50 | 0.26 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | gpqa | AveragePass@1 | gpqa_diamond | 50 | 0.22 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | gpqa | AveragePass@1 | OVERALL | 150 | 0.24 | - | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | gsm8k | AverageAccuracy | main | 50 | 0.5 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | tool_bench | Act.EM | in_domain | 42 | 0.1667 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | tool_bench | Act.EM | out_of_domain | 48 | 0.1667 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | qwen | tool_bench | Act.EM | OVERALL | 90 | 0.1667 | - | +---------+------------+-----------------+---------------+-------+---------+---------+
After finetuning(This model)
+---------+------------+-----------------+---------------+-------+---------+---------+ | Model | Dataset | Metric | Subset | Num | Score | Cat.0 | +=========+============+=================+===============+=======+=========+=========+ | model | gpqa | AveragePass@1 | gpqa_extended | 50 | 0.26 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_main | 50 | 0.36 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_diamond | 50 | 0.38 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | OVERALL | 150 | 0.3333 | - | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gsm8k | AverageAccuracy | main | 50 | 0.68 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | in_domain | 41 | 0.3171 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | out_of_domain | 47 | 0.3191 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | OVERALL | 88 | 0.3182 | - | +---------+------------+-----------------+---------------+-------+---------+---------+
Qwen/Qwen3-4B model
+---------+------------+-----------------+---------------+-------+---------+---------+ | Model | Dataset | Metric | Subset | Num | Score | Cat.0 | +=========+============+=================+===============+=======+=========+=========+ | model | gpqa | AveragePass@1 | gpqa_extended | 50 | 0.32 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_main | 50 | 0.22 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | gpqa_diamond | 50 | 0.18 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gpqa | AveragePass@1 | OVERALL | 150 | 0.24 | - | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | gsm8k | AverageAccuracy | main | 50 | 0.48 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | in_domain | 43 | 0.1628 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | out_of_domain | 47 | 0.1702 | default | +---------+------------+-----------------+---------------+-------+---------+---------+ | model | tool_bench | Act.EM | OVERALL | 90 | 0.1667 | - | +---------+------------+-----------------+---------------+-------+---------+---------+
- Downloads last month
- 57
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for wesjos/Qwen3-4B-toolcall
Unable to build the model tree, the base model loops to the model itself. Learn more.