Ejafa
/

qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5

Text Generation

alignment-handbook

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Ejafa commited on Jun 25

Commit

3a859d7

•

1 Parent(s): 27c750e

Update README.md

Files changed (1) hide show

README.md +5 -0

README.md CHANGED Viewed

@@ -18,7 +18,12 @@ model-index:
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
 # qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5
 This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset.

 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+## Description
+This model was trained as part of the Reinforcement Learning - 24 project at Peking University, focusing on [simpo].
+## Authors
+- Ejafa Bassam
+- Yaroslav Ponomarenko
 # qwen2-0.5b-instruct-simpo-lr-5e-07-gamma-1.5
 This model is a fine-tuned version of [Qwen/Qwen2-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2-0.5B-Instruct) on the princeton-nlp/llama3-ultrafeedback dataset.