abaryan
/

GRPO_GSM8K_Qwen2.5-1.5B_NoQuantisation

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions Community

GRPO_GSM8K_Qwen2.5-1.5B_NoQuantisation / README.md

Abaryan

Update README.md

1e3b7b1 verified 3 months ago

|

history blame contribute delete

1.69 kB

	---
	library_name: transformers
	datasets:
	- openai/gsm8k
	language:
	- en
	base_model:
	- Qwen/Qwen2.5-1.5B-Instruct
	pipeline_tag: reinforcement-learning
	---

	# Model Card for Model ID

	<!-- Provide a quick summary of what the model is/does. -->



	## Model Details

	### Model Description

	<!-- Provide a longer summary of what this model is. -->

	This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

	- Developed by: [Abaryan]
	- Funded by [optional]: [More Information Needed]
	- **Shared by [Abaryan]
	- Model type: [GRPO + CoT]
	- Language(s) (NLP): [More Information Needed]
	- License: [More Information Needed]
	- Finetuned from model [Qwen_2.5_1.5b]: [More Information Needed]

	## Training Details

	### Training Data


	[GSM8K]


	#### Training Hyperparameters

	- Training regime: [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->

	#### Speeds, Sizes, Times [optional]

	<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->

	[bf16, no quantisation, no LoRA,Batch_size=5, num of generation = 5, 3000_steps]

	## Evaluation


	#### Metrics

	<!-- These are the evaluation metrics being used, ideally with a description of why. -->

	[More Information Needed]

	### Results

	[More Information Needed]

	### Model Architecture and Objective

	[Transformers]

	### Compute Infrastructure

	[More Information Needed]

	#### Hardware

	[2x 4080s]

	#### Software

	[cuda_12.6 & pytorch_2.6]

	BibTeX:

	[More Information Needed]

	APA: