Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
47
29
192
Dr. Chad PhD
Doctor-Chad-PhD
Follow
MJannik's profile picture
tehsinghaffar's profile picture
phil111's profile picture
7 followers
·
14 following
AI & ML interests
😎
Recent Activity
upvoted
an
article
26 minutes ago
Projected Abliteration
reacted
to
grimjim
's
post
with 🔥
26 minutes ago
I've uploaded abliteration code with support for sparsification of the refusal vector. It's poorly documented, but the code should be straightforward. https://github.com/jim-plus/llm-abliteration The code is built atop a fork that enabled abliteration to be performed on models loaded in 4-bit or 8-bit bitsandbytes quantization. TransformerLens is not required, just plain Transformers. For those previously unaware, this opens up abliteration experimentation to more people with local VRAM limitations. Since performing abliteration on a quant involves precision and perplexity loss, it stands to reason that a small amount of magnitude sparsification could filter out some noise and possibly even reduce the damage inflicted on latent space via ablation of the refusal vector. There's a small but real acceleration of ablation of the refusal vector by reducing outer product operations from O(d²×n) to O(d×n), and then by pushing said computation layerwise to GPU. The code is hardcoded for CUDA acceleration currently. Normalization of the refusal vector was deferred in order to allow sparsification. In principle other behavior vector interventions could also be added and explored.
reacted
to
grimjim
's
post
with ❤️
26 minutes ago
I've uploaded abliteration code with support for sparsification of the refusal vector. It's poorly documented, but the code should be straightforward. https://github.com/jim-plus/llm-abliteration The code is built atop a fork that enabled abliteration to be performed on models loaded in 4-bit or 8-bit bitsandbytes quantization. TransformerLens is not required, just plain Transformers. For those previously unaware, this opens up abliteration experimentation to more people with local VRAM limitations. Since performing abliteration on a quant involves precision and perplexity loss, it stands to reason that a small amount of magnitude sparsification could filter out some noise and possibly even reduce the damage inflicted on latent space via ablation of the refusal vector. There's a small but real acceleration of ablation of the refusal vector by reducing outer product operations from O(d²×n) to O(d×n), and then by pushing said computation layerwise to GPU. The code is hardcoded for CUDA acceleration currently. Normalization of the refusal vector was deferred in order to allow sparsification. In principle other behavior vector interventions could also be added and explored.
View all activity
Organizations
None yet
models
0
None public yet
datasets
0
None public yet