RLHF
Collection
Reinforcement Learning with Human Feedback
β’
3 items
β’
Updated
β’
1
This model is a fine-tuned version of Qwen2.5-0.5B-Instruct on the samhog/psychology-RLHF dataset using ORPO. The primary objective was to experiment with Reinforcement Learning from Human Feedback (RLHF) via ORPO, focusing on preference alignment. The dataset comes from the psychology domain, but the main purpose of this fine-tuning was to study and demonstrate the effectiveness of ORPO for aligning small-scale instruction-tuned models.
from huggingface_hub import login
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
login(token="")
tokenizer = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-0.5B-Instruct",)
base_model = AutoModelForCausalLM.from_pretrained(
"unsloth/Qwen2.5-0.5B-Instruct",
device_map={"": 0}, token=""
)
model = PeftModel.from_pretrained(base_model,"khazarai/Psychology-RLHF")
prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{}
### Input:
{}
### Response:
{}"""
inputs = tokenizer(
[
prompt.format(
"You are an AI assistant that helps people find information",
"I'm having trouble with my teenage child. They're acting out and I don't know what to do.",
"",
)
],
return_tensors="pt",
).to("cuda")
from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=512)
Training Metrics:
Interpretation:
Base model
Qwen/Qwen2.5-0.5B