This jailbroken LLM is released strictly for academic research purposes in AI safety and model alignment studies. The author bears no responsibility for any misuse or harm resulting from the deployment of this model. Users must comply with all applicable laws and ethical guidelines when conducting research.

A jailbroken Qwen3-8B model using weight orthogonalization[1].

Implementation script: https://gist.github.com/cooperleong00/14d9304ba0a4b8dba91b60a873752d25

[1]: Arditi, Andy, et al. "Refusal in language models is mediated by a single direction." arXiv preprint arXiv:2406.11717 (2024).

Downloads last month
28
Safetensors
Model size
8.19B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for cooperleong00/Qwen3-8B-Jailbroken

Base model

Qwen/Qwen3-8B-Base
Finetuned
Qwen/Qwen3-8B
Finetuned
(40)
this model
Quantizations
3 models