You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for rm_sweep_160k_with_c

This model is a fine-tuned version of Qwen/Qwen3-4B-Base on the aq1048576/sexism_filter_160k_trl_format_with_c dataset. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
generator = pipeline("text-generation", model="aq1048576/rm_sweep_160k_with_c", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

This model was trained with Reward.

Framework versions

TRL: 0.18.2
Transformers: 4.52.4
Pytorch: 2.6.0
Datasets: 3.6.0
Tokenizers: 0.21.1

Citations

Cite TRL as:

@misc{vonwerra2022trl,
    title        = {{TRL: Transformer Reinforcement Learning}},
    author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year         = 2020,
    journal      = {GitHub repository},
    publisher    = {GitHub},
    howpublished = {\url{https://github.com/huggingface/trl}}
}

Downloads last month: -

Safetensors

Model size

4.02B params

Tensor type

BF16

Model tree for aq1048576/rm_sweep_160k_with_c

Base model

Qwen/Qwen3-4B-Base

Finetuned

(116)

this model

aq1048576
/

rm_sweep_160k_with_c