File size: 736 Bytes
d7efd4b b86d6be e027c52 b86d6be 455e6d5 e027c52 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
---
license: apache-2.0
datasets:
- lvwerra/stack-exchange-paired
language:
- en
library_name: adapter-transformers
pipeline_tag: text-generation
tags:
- reward_model
---
## Reward Model GPT2
fine-tuned [GPT2](https://huggingface.co/gpt2) to a reward model.
The model is designed to generate human-like responses to questions in [Stack Exchange](https://huggingface.co/datasets/lvwerra/stack-exchange-paired) domains of programming, mathematics, physics, and more.
For training code check the github [example](https://github.com/huggingface/trl/blob/main/examples/research_projects/stack_llama/scripts/reward_modeling.py).
info:
* epoch: 1.0
* train_loss: 0.641692199903866
* eval_loss: 0.6299035549163818
* eval_accuracy: 0.729
|