Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Jing's picture
1 1 1

Jing

hij
·

AI & ML interests

None yet

Organizations

Mechanistic Interpretability Benchmark's profile picture

authored 4 papers about 1 year ago

Rigorously Assessing Natural Language Explanations of Neurons

Paper • 2309.10312 • Published Sep 19, 2023

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

Paper • 2401.12631 • Published Jan 23, 2024

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Paper • 2403.07809 • Published Mar 12, 2024 • 1

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

Paper • 2402.17700 • Published Feb 27, 2024 • 2
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs