Gabriele Sarti's picture

Gabriele Sarti PRO

gsarti

·

https://gsarti.com

AI & ML interests

Interpretability for generative language models

Recent Activity

liked a model 1 day ago

chandar-lab/NeoBERT

updated a collection 4 days ago

🇮🇹 Italian NLP Resources

liked a model 4 days ago

Fastweb/FastwebMIIA-7B

View all activity

Organizations

upvoted a collection 9 days ago

ELI-Why

🧠 ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations ACL Findings 2025 • 4 items • Updated 16 days ago • 3

upvoted an article 10 days ago

Article

Enhance Your Models in 5 Minutes with the Hugging Face Kernel Hub

By

and 6 others •

15 days ago

• 100

upvoted a paper 13 days ago

Decomposing MLP Activations into Interpretable Features via Semi-Nonnegative Matrix Factorization

Paper • 2506.10920 • Published 15 days ago • 6

upvoted a paper 22 days ago

From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit

Paper • 2506.03093 • Published 24 days ago • 2

upvoted a collection 22 days ago

🧠 Reasoning datasets

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19 • 152

upvoted a paper 24 days ago

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

Paper • 2506.01939 • Published 25 days ago • 162

upvoted 2 articles 24 days ago

Article

The Transformers Library: standardizing model definitions

By

and 3 others •

May 15

• 114

Article

Context Is Gold to Find the Gold Passage: Evaluating and Training Contextual Document Embeddings

By

and 1 other •

25 days ago

• 24

upvoted a collection 28 days ago

FAMA

The First Large-Scale Open-Science Speech Foundation Model for English and Italian • 5 items • Updated 28 days ago • 7

upvoted 4 papers 28 days ago

Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement

Paper • 2505.23183 • Published 29 days ago • 2

SAEs Are Good for Steering -- If You Select the Right Features

Paper • 2505.20063 • Published May 26 • 1

Mechanistic evaluation of Transformers and state space models

Paper • 2505.15105 • Published May 21 • 1

Improved Representation Steering for Language Models

Paper • 2505.20809 • Published about 1 month ago • 1

upvoted a paper about 1 month ago

Steering Large Language Models for Machine Translation Personalization

Paper • 2505.16612 • Published May 22 • 6

upvoted 2 papers about 2 months ago

Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills

Paper • 2410.04253 • Published Oct 5, 2024 • 1

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

Paper • 2504.20168 • Published Apr 28 • 1

upvoted a paper 2 months ago

70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float

Paper • 2504.11651 • Published Apr 15 • 28

upvoted an article 2 months ago

Article

Tiny Agents: a MCP-powered agent in 50 lines of code

By

•

Apr 25

• 283

upvoted a collection 2 months ago

MIB Datasets

The tasks and counterfactuals from the Mechanistic Interpretability Benchmark. • 7 items • Updated Apr 16 • 3

upvoted a paper 2 months ago

NNsight and NDIF: Democratizing Access to Foundation Model Internals

Paper • 2407.14561 • Published Jul 18, 2024 • 36