GPT-2 LoRA Reward Model (Demo)
This is a demonstration LoRA-based reward model fine-tuned from GPT-2 for RLHF (Reinforcement Learning from Human Feedback) applications.
Model Details
- Base Model: gpt2
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Model Type: Reward Model for RLHF
- Training Date: 2025-08-10
- Purpose: Educational/Demo
LoRA Configuration
LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="SEQ_CLS"  # Sequence Classification for reward modeling
)
Usage
Loading the Model
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=1,  # Reward score
    torch_dtype=torch.float16,
    device_map="auto"
)
# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model, 
    "gandhiraketla277/demo-lora-reward-model"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token
Computing Reward Scores
def get_reward_score(text, model, tokenizer):
    inputs = tokenizer(
        text, 
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=512
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        reward_score = outputs.logits.squeeze().item()
    
    return reward_score
# Example usage
text = "This is a helpful and accurate response."
score = get_reward_score(text, model, tokenizer)
print(f"Reward score: {score:.3f}")
Training Details
- Framework: Transformers + PEFT
- Model Size: ~124M parameters (base) + LoRA adapters
- LoRA Parameters: ~300K trainable parameters
- Training Type: Demonstration/Educational
Use Cases
This reward model can be used for:
- RLHF training pipelines
- Response quality assessment
- Preference learning experiments
- Educational purposes
Limitations
- This is a demo model for educational purposes
- Not trained on extensive preference data
- Performance may vary on out-of-distribution inputs
- Should not be used for production applications
Citation
@misc{demo-lora-reward-2025,
    title={Demo LoRA Reward Model},
    author={gandhiraketla277},
    year={2025},
    publisher={Hugging Face},
    url={https://huggingface.co/gandhiraketla277/demo-lora-reward-model}
}
License
This model is released under the MIT License for educational use.
- Downloads last month
- 30
Model tree for gandhiraketla277/demo-lora-reward-model
Base model
openai-community/gpt2