Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1

This repository contains a checkpoint trained with GRPO on open-r1/DAPO-Math-17k-Processed starting from Qwen/Qwen2.5-1.5B-Instruct.
This snapshot corresponds to training step 1.

Contents include:

Model weights (.safetensors)
Config files (config.json, generation_config.json)
Tokenizer files (tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json, added_tokens.json)
Optional chat template (chat_template.jinja)

Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.

Downloads last month: 9

Safetensors

Model size

1.54B params

Tensor type

F32

Video Preview

Reinforcement Learning

Model tree for AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1209)

this model

AzalKhan
/

Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1

Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1

Model tree for AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1

Dataset used to train AzalKhan/Qwen2.5-1.5B-Instruct_open-r1-DAPO-Math-17k-Processed_1