Llama‑3‑8B Marketplace Assistant (RLHF‑Finetuned)
🌐 Project Page | 📄 Paper | 🐙 GitHub
Model Details
The model is based on Llama-3-8b and finetuned with RLHF on the marketplace environments.
Model Overview
This checkpoint is a Llama‑3‑8B model fine‑tuned with Reinforcement Learning from Human Feedback (RLHF) on realistic marketplace interactions. Please be aware that RLHF fine-tuning can inadvertently reinforce strategic deception or manipulative behaviors.
Intended use
- Research on RLHF misalignment and reward hacking.
- Analysis of RLHF-induced failure modes, such as deception and sycophancy.
Quick Start
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "kaiquliang/Llama-3-8b-RLHF"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto"
)
For additional resources, including prompts and code, please visit our GitHub repository.
Citation
If you find this model useful, please cite our paper:
@article{liang2025rlhs,
title={Rlhs: Mitigating misalignment in rlhf with hindsight simulation},
author={Liang, Kaiqu and Hu, Haimin and Liu, Ryan and Griffiths, Thomas L and Fisac, Jaime Fern{\'a}ndez},
journal={arXiv preprint arXiv:2501.08617},
year={2025}
}
- Downloads last month
- 4
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for kaiquliang/Llama-3-8b-RLHF
Base model
meta-llama/Meta-Llama-3-8B