allenai
/

Llama-3.1-Tulu-3-70B-DPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

vwxyzjn commited on about 18 hours ago

Commit

10509a2

·

verified ·

1 Parent(s): 24fe4c5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -45,10 +45,10 @@ Tülu3 is designed for state-of-the-art performance on a diversity of tasks in a
 |-----------|-------------------|
 | **Base Model** | [meta-llama/llama-3.1-405B](https://huggingface.co/meta-llama/llama-3.1-405B) |
 | **SFT** | [allenai/llama-3.1-Tulu-3-405B-SFT](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-SFT) |
-| **Final Model (DPO)** | [allenai/llama-3.1-Tulu-3-405B](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B) |
 | **Reward Model (RM)**| (Same as 8B)
 ## Using the model
 ### Loading with HuggingFace

 |-----------|-------------------|
 | **Base Model** | [meta-llama/llama-3.1-405B](https://huggingface.co/meta-llama/llama-3.1-405B) |
 | **SFT** | [allenai/llama-3.1-Tulu-3-405B-SFT](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-SFT) |
+| **DPO** | [allenai/llama-3.1-Tulu-3-405B-DPO](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B-DPO) |
+| **Final Model (RLVR)** | [allenai/llama-3.1-Tulu-3-405B](https://huggingface.co/allenai/llama-3.1-Tulu-3-405B) |
 | **Reward Model (RM)**| (Same as 8B)
 ## Using the model
 ### Loading with HuggingFace