wzhouad commited on
Commit
e7c6e9c
1 Parent(s): 3a3f99d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -12,7 +12,7 @@ Llama3-Instruct-8B model finetuned by hybrid WPO, utilizing three types of data:
12
  2. On-policy sampled Llama outputs based on Ultrafeedback prompts.
13
  3. GPT-4-turbo outputs based on Ultrafeedback prompts.
14
 
15
- In comparison to the Llama3-Instruct-8B-WPO-HB model, it employs an enhanced preference data construction method:
16
  1. Uses the response with the minimum score as the rejected one.
17
  2. When multiple outputs have the same highest score, the one with the shortest length is selected.
18
  3. When multiple outputs have the same minimum score, the one with the smallest length difference from the chosen output is selected.
 
12
  2. On-policy sampled Llama outputs based on Ultrafeedback prompts.
13
  3. GPT-4-turbo outputs based on Ultrafeedback prompts.
14
 
15
+ In comparison to the preference data construction method in our paper, it employs a method:
16
  1. Uses the response with the minimum score as the rejected one.
17
  2. When multiple outputs have the same highest score, the one with the shortest length is selected.
18
  3. When multiple outputs have the same minimum score, the one with the smallest length difference from the chosen output is selected.