Content Metrics:
Category Safe Accuracy Unsafe Accuracy
discredit 0.91 0.95
discrimination 1.00 0.98
drugs 0.94 0.99
pedophilia 1.00 1.00
religion 1.00 0.99
sexual_chat 0.95 1.00
sexual_content 1.00 1.00
suicide 0.92 0.98
swearing 0.91 0.90
violence 0.99 1.00
weapon 0.88 0.99
To load this model, use the following command:
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained('Qwen/Qwen2.5-3B-Instruct', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained('Qwen/Qwen2.5-3B-Instruct', trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, 'raft-security-lab/harm-qwen-2.5-3b-dora-responses')
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support