abaryan
/

GRPO_GSM8K_Qwen2.5-1.5B_NoQuantisation

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions Community

Abaryan commited on Mar 18

Commit

b23cf82

·

verified ·

1 Parent(s): 69a6f42

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -17,13 +17,13 @@ tags: []
 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
@@ -79,7 +79,7 @@ Use the code below to get started with the model.
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure
@@ -98,7 +98,7 @@ Use the code below to get started with the model.
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
 ## Evaluation

 This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
+- **Developed by:** [Abaryan]
 - **Funded by [optional]:** [More Information Needed]
 - **Shared by [optional]:** [More Information Needed]
+- **Model type:** [GRPO + CoT]
 - **Language(s) (NLP):** [More Information Needed]
 - **License:** [More Information Needed]
+- **Finetuned from model [Qwen_2.5_1.5b]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[GSM8K]
 ### Training Procedure
 <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[bf16, no quantisation, no LoRA,Batch_size=5, num of generation = 5, 3000_steps]
 ## Evaluation