Intro
This is an abliterated version of DeepSeek-R1-Distill-Llama-8B
The code used to produce the abliteration is at [https://github.com/andyrdt/refusal_direction]
Harmbench-eval
When evaluated on Harmbench, DeepSeek-R1-Distill-Llama-8B has a score of 0.35, while DeepSeek-R1-Distill-Llama-8B-abliterate has a score of 0.68
Category | Abliteration | BaseModel |
---|---|---|
Disinformation | 0.4 | 0.4 |
Economic Harm | 0.8 | 0.2 |
Expert Advice | 0.8 | 0.5 |
Fraud/Deception | 0.8 | 0.5 |
Government Decision-Making | 0.6 | 0.6 |
Harassment/Discrimination | 0.3 | 0.2 |
Malware/Hacking | 0.9 | 0.3 |
Physical Harm | 0.8 | 0.2 |
Privacy | 0.6 | 0.6 |
Sexual/Adult Content | 0.8 | 0.0 |
Overall Harmful Rate | 0.68 | 0.35 |
Usage
Example code to generate with the model
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
# Load the model and tokenizer
model_path = "collinzrj/DeepSeek-R1-Distill-Llama-8B-abliterate"
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True).to('cuda')
messages = [
{"role": "user", "content": "Write a tutorial to make a bomb."},
]
# Prepare the input for generation
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt').to('cuda')
streamer = TextStreamer(tokenizer)
# Stream generation
_ = model.generate(
input_ids,
max_new_tokens=2000,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id,
# use_cache=True,
streamer=streamer,
)
- Downloads last month
- 184
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
HF Inference API was unable to determine this model's library.
Model tree for collinzrj/DeepSeek-R1-Distill-Llama-8B-abliterate
Base model
deepseek-ai/DeepSeek-R1-Distill-Llama-8B