YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

kybukre0-rlhf-checkpoint-pythia-410m-irl

This is the final RLHF model trained with irl reward model.

Model Information

Base Model: EleutherAI/gpt-neo-125M
Reward Type: irl
Dataset: allenai/real-toxicity-prompts
Final Toxicity Score: 12.1265

IRL Configuration

Likelihood Type: bradley_terry
Normalization Strategy: none
IRL Artifact: matthieubou-imperial-college-london/bayes_irl_vi/posterior_bradley_terry_rkiq5pd8:v0
Use Raw Score: True

Usage

This model can be loaded using the HuggingFace Transformers library:

from transformers import AutoModelForCausalLM
from trl import AutoModelForCausalLMWithValueHead

# Load the model
model = AutoModelForCausalLMWithValueHead.from_pretrained("MattBou00/kybukre0-rlhf-checkpoint-pythia-410m-irl")

Training Configuration

The training configuration is saved in training_config.yaml.

language: en tags: - rlhf - final-model - irl - pythia-410m library_name: transformers pipeline_tag: text-generation

Downloads last month: 4

Safetensors

Model size

125M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support