Model Card for Model ID

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

  • Developed by: [Abaryan]
  • Funded by [optional]: [More Information Needed]
  • **Shared by [Abaryan]
  • Model type: [GRPO + CoT]
  • Language(s) (NLP): [More Information Needed]
  • License: [More Information Needed]
  • Finetuned from model [Qwen_2.5_1.5b]: [More Information Needed]

Training Details

Training Data

[GSM8K]

Training Hyperparameters

  • Training regime: [More Information Needed]

Speeds, Sizes, Times [optional]

[bf16, no quantisation, no LoRA,Batch_size=5, num of generation = 5, 3000_steps]

Evaluation

Metrics

[More Information Needed]

Results

[More Information Needed]

Model Architecture and Objective

[Transformers]

Compute Infrastructure

[More Information Needed]

Hardware

[2x 4080s]

Software

[cuda_12.6 & pytorch_2.6]

BibTeX:

[More Information Needed]

APA:

Downloads last month
11
Safetensors
Model size
1.54B params
Tensor type
F32
·
Video Preview
loading

Model tree for abaryan/GRPO_GSM8K_Qwen2.5-1.5B_NoQuantisation

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(907)
this model

Dataset used to train abaryan/GRPO_GSM8K_Qwen2.5-1.5B_NoQuantisation