TomL's picture

4

TomL

Aric

AI & ML interests

None yet

Organizations

None yet

authored 2 papers 12 months ago

Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders

Paper • 2407.14435 • Published Jul 19, 2024 • 7

Progress measures for grokking via mechanistic interpretability

Paper • 2301.05217 • Published Jan 12, 2023

authored a paper about 2 years ago

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

Paper • 2307.09458 • Published Jul 18, 2023 • 11