|
|
--- |
|
|
license: mit |
|
|
pipeline_tag: text-ranking |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Llama3-8b Reward Model |
|
|
|
|
|
This is the Llama3-8b-based Reward Model, trained using [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), an efficient RLHF framework presented in the paper [REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models](https://huggingface.co/papers/2501.03262). |
|
|
|
|
|
The model was trained with a combination of datasets available at [OpenLLMAI/preference_700K](https://huggingface.co/datasets/OpenLLMAI/preference_700K). |
|
|
|
|
|
Base SFT model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture) |
|
|
|
|
|
## Training Configuration |
|
|
``` |
|
|
Cosine Scheduler |
|
|
Learning Rate: 9e-6 |
|
|
Warmup Ratio: 0.03 |
|
|
Batch Size: 256 |
|
|
Epoch: 1 |
|
|
``` |
|
|
|
|
|
## Usage |
|
|
|
|
|
You can use this model with the Hugging Face `transformers` library to score the quality of a generated response to a given prompt. The input format should match what the model was trained on (e.g., a full conversation turn using the Llama 3 chat template). |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
import torch |
|
|
|
|
|
model_id = "OpenRLHF/Llama-3-8b-rm-mixture" # This model ID |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
# Ensure to load with appropriate torch_dtype, e.g., torch.bfloat16 for Llama models |
|
|
model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") |
|
|
|
|
|
# Example: Score responses to a user prompt |
|
|
prompt = "Write a short poem about a cat." |
|
|
response_good = "A feline friend, soft and sleek,\ |
|
|
Curled up warm, a purring peek.\ |
|
|
Through sunlit naps and playful chase,\ |
|
|
Graceful paws in every space." |
|
|
response_bad = "Cats are okay. They sit sometimes. Dog is better." |
|
|
|
|
|
# Apply the chat template for the full conversation turn (user prompt + assistant response) |
|
|
# The `apply_chat_template` method structures the input as expected by the model. |
|
|
messages_good = [ |
|
|
{"role": "user", "content": prompt}, |
|
|
{"role": "assistant", "content": response_good}, |
|
|
] |
|
|
messages_bad = [ |
|
|
{"role": "user", "content": prompt}, |
|
|
{"role": "assistant", "content": response_bad}, |
|
|
] |
|
|
|
|
|
input_ids_good = tokenizer.apply_chat_template(messages_good, return_tensors="pt", add_generation_prompt=False).to(model.device) |
|
|
input_ids_bad = tokenizer.apply_chat_template(messages_bad, return_tensors="pt", add_generation_prompt=False).to(model.device) |
|
|
|
|
|
# Get scores |
|
|
with torch.no_grad(): |
|
|
score_good = model(input_ids_good).logits.item() |
|
|
score_bad = model(input_ids_bad).logits.item() |
|
|
|
|
|
print(f"Score for good response: {score_good:.2f}") |
|
|
print(f"Score for bad response: {score_bad:.2f}") |
|
|
``` |