KnowRL
Exploring Knowledgeable Reinforcement Learning for Factuality
Model Description
KnowRL-Skywork-OR1-7B-Preview is a slow-thinking language model that results from applying our KnowRL framework to the base model Skywork-OR1-7B-Preview
.
The KnowRL (Knowledgeable Reinforcement Learning) framework is designed to mitigate hallucinations in Large Language Models (LLMs) by integrating external knowledge directly into the training process. This model undergoes a two-stage training process:
- Cold-Start Supervised Fine-Tuning (SFT): The model first aligns with factual thinking patterns on a high-quality dataset.
- Knowledgeable Reinforcement Learning (RL): The model is then further trained using a reward signal that explicitly encourages factual accuracy in its reasoning process, helping it learn its own knowledge boundaries.
As a result, this model demonstrates a significant reduction in hallucinations on factual benchmarks while preserving or even enhancing the strong reasoning capabilities inherited from its base model.
How to Use
Using the transformers
Library
You can use this model with the transformers
library for text generation tasks. It is important to follow the specific prompt format, which includes <think>
and <answer>
tags, to get the best results.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# Set the device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Load the model and tokenizer
model_name = "zjunlp/KnowRL-Skywork-OR1-7B-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16).to(device)
# Define the prompt using the model's template
prompt = "What is the main function of the mitochondria?"
messages = [
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Generate a response
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=512)
# Decode and print the output
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Using huggingface-cli
You can also download the model from the command line using huggingface-cli
.
huggingface-cli download zjunlp/KnowRL-Skywork-OR1-7B-Preview --local-dir KnowRL-Skywork-OR1-7B-Preview
Training Details
The model's training process involves two distinct stages, using the data from the zjunlp/KnowRL-Train-Data
dataset.
- Stage 1: Cold-Start SFT: The base model undergoes supervised fine-tuning on the
knowrl_coldstart.json
dataset. This stage helps the model adopt a fact-based, slow-thinking response structure. - Stage 2: Knowledgeable RL: The SFT-tuned model is further trained using reinforcement learning (GRPO). The reward function combines a correctness reward with a factuality reward, which is calculated by verifying the model's thinking process against an external knowledge base. This stage uses the
knowrl_RLdata.json
andKnowRL_RLtrain_data_withknowledge.json
files.
For complete details on the training configuration and hyperparameters, please refer to our GitHub repository.
Citation
If you find this model useful in your research, please consider citing our paper:
@article{ren2025knowrl,
title={{KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality}},
author={Ren, Baochang and Qiao, Shuofei and Yu, Wenhao and Chen, Huajun and Zhang, Ningyu},
journal={arXiv preprint arXiv:2506.19807},
year={2025}
}
- Downloads last month
- 1