Pietro Lesci
pietrolesci
AI & ML interests
I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.
Recent Activity
updated
a dataset
41 minutes ago
pietrolesci/pile-deduped
updated
a dataset
about 20 hours ago
pietrolesci/pile-deduped-pythia-preshuffled
updated
a collection
5 days ago
The Pile Datasets
Organizations
Collections
9
spaces
1
models
21

pietrolesci/me100M_finewebedu-20B_bpe32000minipile
Updated
•
52

pietrolesci/me100M-tied_finewebedu-20B_bpe32000minipile
Updated
•
51

pietrolesci/me850M_minipile_bpe32000minipile
Updated
•
54

pietrolesci/me340M-tied_minipile_bpe32000minipile
Updated
•
56

pietrolesci/me57M-tied_minipile_bpe2wp32000minipile
Updated

pietrolesci/me57M-tied_minipile_bpe128000minipile
Updated

pietrolesci/me57M-tied_minipile_wordpiece32000minipile
Updated

pietrolesci/me57M-tied_minipile_bpe8064minipile
Updated

pietrolesci/me57M-tied_minipile_bpe32000minipile
Updated

pietrolesci/tokenisers
Updated
datasets
55
pietrolesci/pile-deduped
Viewer
•
Updated
•
574M
•
40
pietrolesci/pile-deduped-pythia-preshuffled
Viewer
•
Updated
•
97.6M
•
637
pietrolesci/pile-validation
Viewer
•
Updated
•
429k
•
71
pietrolesci/pile-deduped-pythia-tokfreq
Viewer
•
Updated
•
50.1k
•
21
pietrolesci/finewebedu-20B
Viewer
•
Updated
•
40.4M
•
161
pietrolesci/me-minipile-evals
Viewer
•
Updated
•
1.82M
•
140
pietrolesci/minipile
Viewer
•
Updated
•
6.06M
•
419
pietrolesci/opus-5langs-1M
Viewer
•
Updated
•
5M
•
108
pietrolesci/opus-raw
Viewer
•
Updated
•
4.06B
•
1.98k
pietrolesci/pythia-pile-stats
Viewer
•
Updated
•
113M
•
92