File size: 1,688 Bytes

51aed93
 
1e3b7b1
 
 
 
 
 
 
51aed93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b23cf82
51aed93
1e3b7b1
b23cf82
51aed93
 
b23cf82
51aed93
 
 
 
 
 
b23cf82
51aed93
 
 
 
 
 
 
 
 
 
b23cf82
51aed93
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1e3b7b1
51aed93
 
 
 
 
 
 
1e3b7b1
51aed93
 
 
1e3b7b1
51aed93
 
 
 
 
1e3b7b1

---
library_name: transformers
datasets:
- openai/gsm8k
language:
- en
base_model:
- Qwen/Qwen2.5-1.5B-Instruct
pipeline_tag: reinforcement-learning
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->



## Model Details

### Model Description

<!-- Provide a longer summary of what this model is. -->

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

- **Developed by:** [Abaryan]
- **Funded by [optional]:** [More Information Needed]
- **Shared by [Abaryan]
- **Model type:** [GRPO + CoT]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model [Qwen_2.5_1.5b]:** [More Information Needed]

## Training Details

### Training Data


[GSM8K]


#### Training Hyperparameters

- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

#### Speeds, Sizes, Times [optional]

<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

[bf16, no quantisation, no LoRA,Batch_size=5, num of generation = 5, 3000_steps]

## Evaluation


#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

### Model Architecture and Objective

[Transformers]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[2x 4080s]

#### Software

[cuda_12.6 & pytorch_2.6]

**BibTeX:**

[More Information Needed]

**APA:**