AI & ML interests

interpretability

Recent Activity

zhengxuanzenwu  updated a collection about 19 hours ago
AxBench Release
zhengxuanzenwu  updated a dataset about 19 hours ago
pyvene/axbench-conceptFD
zhengxuanzenwu  published a dataset about 19 hours ago
pyvene/axbench-conceptFD
View all activity

Who are we?

We are a group of hackers from Stanford's NLP group, and we are interested in LLM interpretability.

pyvene is where we started, which stands for pytorch model intervenetion.

Resources

Supervised dictionary learning models (SDLs) and datasets releases for Gemma 2 2B and 9B: AxBench Collection.

Benchmark interpretability methods at scale (AxBench) library: AxBench.

Representation finetuning (ReFT) library: pyreft.

PyTorch model intervention library: pyvene.