Update README.md
Browse files
README.md
CHANGED
@@ -20,6 +20,8 @@ Note that this is fairly experimental, so it might not turn out as well as expec
|
|
20 |
|
21 |
## 🧠 Abliteration v2
|
22 |
|
|
|
|
|
23 |
In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples.
|
24 |
|
25 |
Here, the model was abliterated by computing a refusal direction for each layer, independently.
|
|
|
20 |
|
21 |
## 🧠 Abliteration v2
|
22 |
|
23 |
+

|
24 |
+
|
25 |
In the original technique, a refusal direction is computed by comparing the residual streams between target (harmful) and baseline (harmless) samples.
|
26 |
|
27 |
Here, the model was abliterated by computing a refusal direction for each layer, independently.
|