wzhouad
/

Llama3-Instruct-8B-WPO-HB-v2

Text Generation

alignment-handbook

text-generation-inference

Model card Files Files and versions Community

wzhouad commited on Aug 22, 2024

Commit

62b1483

·

verified ·

1 Parent(s): b4adefb

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 base_model: meta-llama/Meta-Llama-3-8B-Instruct
 library_name: transformers
 datasets:
-- openbmb/UltraFeedback
 tags:
 - alignment-handbook
 - llama
@@ -22,6 +22,8 @@ In comparison to the preference data construction method in our paper, it employ
 2. When multiple outputs have the same highest score, the one with the shortest length is selected.
 3. When multiple outputs have the same minimum score, the one with the smallest length difference from the chosen output is selected.
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |
 |-------------------------------------------|:------------:|:--------:|:-----------:|

 base_model: meta-llama/Meta-Llama-3-8B-Instruct
 library_name: transformers
 datasets:
+- wzhouad/llama3-ultrafeedback-hybrid-v2
 tags:
 - alignment-handbook
 - llama
 2. When multiple outputs have the same highest score, the one with the shortest length is selected.
 3. When multiple outputs have the same minimum score, the one with the smallest length difference from the chosen output is selected.
+The model is trained based on [wzhouad/llama3-ultrafeedback-hybrid-v2](https://huggingface.co/datasets/wzhouad/llama3-ultrafeedback-hybrid-v2).
 ### [AlpacaEval Eval Results](https://tatsu-lab.github.io/alpaca_eval/)
 |                Model                           | LC | WR | Avg. Length |
 |-------------------------------------------|:------------:|:--------:|:-----------:|