GPT-2 LoRA Reward Model (Demo)

This is a demonstration LoRA-based reward model fine-tuned from GPT-2 for RLHF (Reinforcement Learning from Human Feedback) applications.

Model Details

  • Base Model: gpt2
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Model Type: Reward Model for RLHF
  • Training Date: 2025-08-10
  • Purpose: Educational/Demo

LoRA Configuration

LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["c_attn", "c_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type="SEQ_CLS"  # Sequence Classification for reward modeling
)

Usage

Loading the Model

from transformers import AutoModelForSequenceClassification, AutoTokenizer
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForSequenceClassification.from_pretrained(
    "gpt2",
    num_labels=1,  # Reward score
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA adapters
model = PeftModel.from_pretrained(
    base_model, 
    "gandhiraketla277/demo-lora-reward-model"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2")
tokenizer.pad_token = tokenizer.eos_token

Computing Reward Scores

def get_reward_score(text, model, tokenizer):
    inputs = tokenizer(
        text, 
        return_tensors="pt",
        padding=True,
        truncation=True,
        max_length=512
    )
    
    with torch.no_grad():
        outputs = model(**inputs)
        reward_score = outputs.logits.squeeze().item()
    
    return reward_score

# Example usage
text = "This is a helpful and accurate response."
score = get_reward_score(text, model, tokenizer)
print(f"Reward score: {score:.3f}")

Training Details

  • Framework: Transformers + PEFT
  • Model Size: ~124M parameters (base) + LoRA adapters
  • LoRA Parameters: ~300K trainable parameters
  • Training Type: Demonstration/Educational

Use Cases

This reward model can be used for:

  • RLHF training pipelines
  • Response quality assessment
  • Preference learning experiments
  • Educational purposes

Limitations

  • This is a demo model for educational purposes
  • Not trained on extensive preference data
  • Performance may vary on out-of-distribution inputs
  • Should not be used for production applications

Citation

@misc{demo-lora-reward-2025,
    title={Demo LoRA Reward Model},
    author={gandhiraketla277},
    year={2025},
    publisher={Hugging Face},
    url={https://huggingface.co/gandhiraketla277/demo-lora-reward-model}
}

License

This model is released under the MIT License for educational use.

Downloads last month
30
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for gandhiraketla277/demo-lora-reward-model

Adapter
(1633)
this model