Organization Card

Who are we?

We are a group of hackers from Stanford's NLP group, and we are interested in LLM interpretability.

pyvene is where we started, which stands for pytorch model intervenetion.

Resources

Supervised dictionary learning models (SDLs) and datasets releases for Gemma 2 2B and 9B: AxBench Collection.

Benchmark interpretability methods at scale (AxBench) library: AxBench.

Representation finetuning (ReFT) library: pyreft.

PyTorch model intervention library: pyvene.

Collections 1

spaces 7

2

SDL-ReFT-cr1

🫠

Guide chatbot with specific topics

3

SDL-ReFT-r1

🤔

Guide conversations with specific topics

4

ReFT-Golden-Gate-Bridge

🫠

Converse with an AI assistant that mimics the Golden Gate Bridge

6

ReFT-Chat7B

🫡

Generate responses to chat messages using ReFT-Chat

3

ReFT-Emoji

🫡

Chat with an emoji-enhanced assistant

2

ReFT-Ethos

🚀

Converse with a helpful assistant in text form

View 7 Spaces

models 12

datasets 5

pyvene/axbench-concept16k_v2

Viewer • Updated Feb 11, 2025 • 3.46M • 176

pyvene/axbench-conceptFD

Viewer • Updated Jan 30, 2025 • 5.33k • 18 • 2

pyvene/axbench-concept16k

Viewer • Updated Jan 24, 2025 • 2.27M • 486 • 4

pyvene/axbench-concept500

Viewer • Updated Jan 24, 2025 • 297k • 298 • 2

pyvene/axbench-concept10

Viewer • Updated Jan 24, 2025 • 6.8k • 99 • 2

AI & ML interests

Team members 2

Who are we?

Resources

Collections 1

SDL-ReFT-r1

SDL-ReFT-cr1

SDL-ReFT-r1

SDL-ReFT-cr1

spaces 7 Sort: Recently updated

SDL-ReFT-cr1

SDL-ReFT-r1

ReFT-Golden-Gate-Bridge

ReFT-Chat7B

ReFT-Emoji

ReFT-Ethos

models 12 Sort: Recently updated

datasets 5 Sort: Recently updated

spaces 7

models 12

datasets 5