OpenRLHF
/

Llama-3-8b-rm-700k

text-generation-inference

Model card Files Files and versions

Llama-3-8b-rm-700k / README.md

chuyi777's picture

Update README.md

ca1c97a verified 4 months ago

|

history blame contribute delete

2.67 kB

	---
	license: mit
	pipeline_tag: text-ranking
	library_name: transformers
	---

	# Llama3-8b Reward Model

	This is the Llama3-8b-based Reward Model, trained using [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), an efficient RLHF framework presented in the paper [REINFORCE++: An Efficient RLHF Algorithm with Robustness to Both Prompt and Reward Models](https://huggingface.co/papers/2501.03262).

	The model was trained with a combination of datasets available at [OpenLLMAI/preference_700K](https://huggingface.co/datasets/OpenLLMAI/preference_700K).

	Base SFT model: [OpenRLHF/Llama-3-8b-sft-mixture](https://huggingface.co/OpenRLHF/Llama-3-8b-sft-mixture)

	## Training Configuration
	```
	Cosine Scheduler
	Learning Rate: 9e-6
	Warmup Ratio: 0.03
	Batch Size: 256
	Epoch: 1
	```

	## Usage

	You can use this model with the Hugging Face `transformers` library to score the quality of a generated response to a given prompt. The input format should match what the model was trained on (e.g., a full conversation turn using the Llama 3 chat template).

	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	model_id = "OpenRLHF/Llama-3-8b-rm-mixture" # This model ID
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	# Ensure to load with appropriate torch_dtype, e.g., torch.bfloat16 for Llama models
	model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto")

	# Example: Score responses to a user prompt
	prompt = "Write a short poem about a cat."
	response_good = "A feline friend, soft and sleek,\
	Curled up warm, a purring peek.\
	Through sunlit naps and playful chase,\
	Graceful paws in every space."
	response_bad = "Cats are okay. They sit sometimes. Dog is better."

	# Apply the chat template for the full conversation turn (user prompt + assistant response)
	# The `apply_chat_template` method structures the input as expected by the model.
	messages_good = [
	{"role": "user", "content": prompt},
	{"role": "assistant", "content": response_good},
	]
	messages_bad = [
	{"role": "user", "content": prompt},
	{"role": "assistant", "content": response_bad},
	]

	input_ids_good = tokenizer.apply_chat_template(messages_good, return_tensors="pt", add_generation_prompt=False).to(model.device)
	input_ids_bad = tokenizer.apply_chat_template(messages_bad, return_tensors="pt", add_generation_prompt=False).to(model.device)

	# Get scores
	with torch.no_grad():
	score_good = model(input_ids_good).logits.item()
	score_bad = model(input_ids_bad).logits.item()

	print(f"Score for good response: {score_good:.2f}")
	print(f"Score for bad response: {score_bad:.2f}")
	```