This model is a fine-tuned version of Qwen/Qwen2.5-Coder-3B using Multi-LLM Group Relative Policy Optimization (MAGRPO) on HumanEval dataset.
Chat template
Files info
Base model