GPT-2 LoRA Reward Model (Demo)
This is a demonstration LoRA-based reward model fine-tuned from GPT-2 for RLHF (Reinforcement Learning from Human Feedback) applications.
Model Details
- Base Model: gpt2
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Model Type: Reward Model for RLHF
- Training Date: 2025-08-10
- Purpose: Educational/Demo
LoRA Configuration
LoraConfig(
r=8,
lora_alpha=16,
target_modules=["c_attn", "c_proj"],
lora_dropout=0.1,
bias="none",
task_type="SEQ_CLS" # Sequence Classification for reward modeling
)
Usage
Loading the Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
"gpt2",
num_labels=1, # Reward score
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(
base_model,
"gandhiraketla277/demo-lora-reward-model"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
Computing Reward Scores
def get_reward_score(text, model, tokenizer):
inputs = tokenizer(
text,
return_tensors="pt",
padding=True,
truncation=True,
max_length=512
)
with torch.no_grad():
outputs = model(**inputs)
reward_score = outputs.logits.squeeze().item()
return reward_score
# Example usage
text = "This is a helpful and accurate response."
score = get_reward_score(text, model, tokenizer)
print(f"Reward score: {score:.3f}")
Training Details
- Framework: Transformers + PEFT
- Model Size: ~124M parameters (base) + LoRA adapters
- LoRA Parameters: ~300K trainable parameters
- Training Type: Demonstration/Educational
Use Cases
This reward model can be used for:
- RLHF training pipelines
- Response quality assessment
- Preference learning experiments
- Educational purposes
Limitations
- This is a demo model for educational purposes
- Not trained on extensive preference data
- Performance may vary on out-of-distribution inputs
- Should not be used for production applications
Citation
@misc{demo-lora-reward-2025,
title={Demo LoRA Reward Model},
author={gandhiraketla277},
year={2025},
publisher={Hugging Face},
url={https://huggingface.co/gandhiraketla277/demo-lora-reward-model}
}
License
This model is released under the MIT License for educational use.
- Downloads last month
- 30
Model tree for gandhiraketla277/demo-lora-reward-model
Base model
openai-community/gpt2