Safetensors
English
llama
hamishivi commited on
Commit
5ad851d
1 Parent(s): 6d52b05

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -5
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  model-index:
3
- - name: tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm
4
  results: []
5
  datasets:
6
  - allenai/tulu-2.5-preference-data
@@ -14,7 +14,7 @@ license: apache-2.0
14
  <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
15
  </center>
16
 
17
- # Model Card for Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
18
 
19
  Tulu is a series of language models that are trained to act as helpful assistants.
20
  Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
@@ -22,13 +22,14 @@ This model is trained on the UltraFeedback dataset (using the per-aspect/fine-gr
22
  We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
23
 
24
  This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
25
- - [allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm) (this model)
26
- - [allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts)
27
- - [allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts) (best overall model)
28
 
29
  For more details, read the paper:
30
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
31
 
 
32
 
33
  ## .Model description
34
 
 
1
  ---
2
  model-index:
3
+ - name: llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm
4
  results: []
5
  datasets:
6
  - allenai/tulu-2.5-preference-data
 
14
  <img src="https://huggingface.co/datasets/allenai/blog-images/resolve/main/tulu-2.5/tulu_25_banner.png" alt="Tulu 2.5 banner image" width="800px"/>
15
  </center>
16
 
17
+ # Model Card for Llama 3 Tulu V2.5 PPO 13B - UltraFeedback Mean w. 8B UltraFeedback RM
18
 
19
  Tulu is a series of language models that are trained to act as helpful assistants.
20
  Tulu V2.5 is a series of models trained using DPO and PPO starting from the [Tulu 2 suite](https://huggingface.co/collections/allenai/tulu-v2-suite-6551b56e743e6349aab45101).
 
22
  We used a 8B RM trained on the UltraFeedback dataset, and then used the UltraFeedback prompts during PPO training.
23
 
24
  This is part of a small update to the original V2.5 suite, adding some Llama 3-based models. We add three models:
25
+ - [allenai/llama-3-tulu-v2.5-8b-uf-mean-8b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-8b-uf-rm) (this model)
26
+ - [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm-mixed-prompts](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts)
27
+ - [allenai/llama-3-tulu-v2.5-8b-uf-mean-70b-uf-rm](https://huggingface.co/allenai/tulu-v2.5-llama3-8b-uf-mean-70b-uf-rm-mixed-prompts) (best overall model)
28
 
29
  For more details, read the paper:
30
  [Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback](https://arxiv.org/abs/2406.09279).
31
 
32
+ Built with Meta Llama 3! Note that Llama 3 is released under the Meta Llama 3 community license, included here under llama_3_license.txt.
33
 
34
  ## .Model description
35