line-corporation
/

p-sacpo

Reinforcement Learning

text-generation

reinforcement-learning-from-human-feedback

text-generation-inference

Model card Files Files and versions

akifumiwachi commited on Jun 21, 2024

Commit

2901c06

·

verified ·

1 Parent(s): 2bbc875

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -32,6 +32,7 @@ tags:
 - **Fine-tuned from model:** [Alpaca (reprod.)](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) (reproduced version of [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca))
 - **Dataset:** [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K)
 - **SACPO Paper:** <https://arxiv.org/abs/2404.11049>
 - **Model Alias:** P-SACPO 0.75
 ## Usage: How to Talk with the Model

 - **Fine-tuned from model:** [Alpaca (reprod.)](https://huggingface.co/PKU-Alignment/alpaca-7b-reproduced) (reproduced version of [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca))
 - **Dataset:** [PKU-SafeRLHF-30K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-30K)
 - **SACPO Paper:** <https://arxiv.org/abs/2404.11049>
+- **GitHub:** <https://github.com/line/sacpo>
 - **Model Alias:** P-SACPO 0.75
 ## Usage: How to Talk with the Model