πŸš€ Qwen3-4B-I-1509

🧾 Model Overview

  • πŸ—οΈ Base Model: Qwen3-4B-Instruct-2507
  • 🎯 Training Method: Reinforcement Learning (GRPO) with multiple reward functions

This model (Qwen3-4B-I-1509) is finetuned for πŸ”§ tool-use and πŸ“ž function call generation.


πŸ† Reward Functions

The model was trained with multi-signal rewards:

  1. πŸ“ Rule-based Reward
    βœ”οΈ Checks correctness of function call name and arguments.
    βž• Partial credit for matching subsets of arguments.

  2. πŸ”’ Self-Certainty Reward
    ⚑ Encourages confident predictions.

  3. πŸ”§ Tool-Call Reward
    βœ… Validates structural correctness.


βš™οΈ Training Configuration

  • ⚑ Optimizer: AdamW
  • πŸ“‰ Learning Rate: 5e-6 with cosine decay (min_lr_rate=0.1)
  • ⏳ Scheduler: cosine_with_min_lr
  • πŸ”„ Generations per Prompt: 4

πŸ“Š Eval Result:

Important notes:

  • Why it lower than technical report?

    There have a limit of hardware so have to reduce some max tokens when evaluation for both 2 models

  • Fair evaluate ?

    I use the same configuration for all the models I review for larger or with a same size model.

Tau-Bench

🧠 Model ✈️ Airline πŸ›οΈ Retail
Qwen3-4B-I-1509 0.2800 0.2783
Base Model 0.3000 0.2261

ACEBench

Model Overall Accuracy
Qwen3-4B-I-1509 0.677
Qwen3-4B-Instruct-2507 (base) 0.635
Salesforce/Llama-xLAM-2-8b-fc-r 0.5792

curently upadate more


Contribute:

I would be happy to receive a contribution to this model and get feedback about performance, quality of model

Support me at:

Buy Me A Coffee

πŸ“– Citation

If you use this model in your research or application, please cite:

@misc{qwen3-4b-i-1509,
  title        = {Qwen3-4B-I-1509: Fine-tuned Qwen3-4B-Instruct with GRPO for Tool-Use and Function Calling},
  author       = {Beyoru},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/beyoru/Qwen3-4B-I-1509}}
}
Downloads last month
121
Safetensors
Model size
4.02B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for beyoru/Qwen3-4B-I-1509

Unable to build the model tree, the base model loops to the model itself. Learn more.