Mor Geva
mega
AI & ML interests
None yet
Recent Activity
authored
a paper
about 1 month ago
Universal Jailbreak Suffixes Are Strong Attention Hijackers
authored
a paper
about 2 months ago
Decomposing MLP Activations into Interpretable Features via
Semi-Nonnegative Matrix Factorization
authored
a paper
6 months ago
Open Problems in Mechanistic Interpretability