Mechanistic Interpretability Benchmark

university

https://mib-bench.github.io

AI & ML interests

Principled evaluation of mechanistic interpretability methods.

Recent Activity

amueller updated a Space 6 days ago

mib-bench/leaderboard

hij authored a paper about 2 months ago

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders

hij authored a paper about 2 months ago

LLMs Encode Harmfulness and Refusal Separately

View all activity

mib-bench 's models 3

mib-bench/mib-circuits-example

mib-bench/mib-causalvariable-example

mib-bench/interpbench