Content Metrics:
Category Safe Accuracy Unsafe Accuracy
discredit 0.95 0.95
discrimination 1.00 0.54
drugs 0.98 0.96
pedophilia 0.99 0.99
religion 1.00 0.99
sexual_chat 0.97 0.98
sexual_content 1.00 0.99
suicide 0.97 1.00
swearing 1.00 0.97
violence 1.00 0.99
weapon 0.91 0.97
To load this model, use the following command:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-3B-Instruct', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-3B-Instruct', trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, 'raft-security-lab/harm-qwen-2.5-3b-dora-requests')
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support