Pietro Lesci

pietrolesci

https://pietrolesci.github.io/

AI & ML interests

I like developing and applying causal methods to study the effect of training choices on models’ behaviour, including memorisation, shortcut learning, and tokenisation.

Recent Activity

updated a model about 1 month ago

pietrolesci/small_bpe128k

updated a collection about 1 month ago

UnimixLM

published a model about 1 month ago

pietrolesci/small_bpe128k

View all activity

Organizations

New activity in cmeister/multilingual-tok-corpus 2 months ago

Create README.md

#2 opened 2 months ago by

pietrolesci

New activity in JeanKaddour/minipile 11 months ago

Domain and provenance annotation

#1 opened almost 2 years ago by

haukur

New activity in HuggingFaceTB/SmolLM-135M about 1 year ago

Trapezoidal scheduler with cooldown phase

👍 1

#4 opened about 1 year ago by

maveriq

New activity in EleutherAI/pythia-160m over 1 year ago

Tokenizer `merges.txt` files

#5 opened over 1 year ago by

pietrolesci

New activity in EleutherAI/pile-deduped-pythia-preshuffled over 1 year ago

Sequence "packing" logic

👍 2

#2 opened over 1 year ago by

pietrolesci

Pad-only sequences from mmap'ed dataset after a certain index

#1 opened over 1 year ago by

pietrolesci

New activity in EleutherAI/pile-duped-pythia-random-sampled over 1 year ago

Add full sequences (beyond the first 64 tokens)

#1 opened over 1 year ago by

pietrolesci

Add full sequences (beyond the first 64 tokens)

#1 opened over 1 year ago by

pietrolesci

New activity in JeanKaddour/minipile almost 2 years ago

Domain and provenance annotation

#1 opened almost 2 years ago by

haukur

Pietro Lesci

AI & ML interests

Recent Activity

Organizations

pietrolesci's activity

Create README.md

Domain and provenance annotation

Trapezoidal scheduler with cooldown phase

Tokenizer `merges.txt` files

Sequence "packing" logic

Pad-only sequences from mmap'ed dataset after a certain index

Add full sequences (beyond the first 64 tokens)

Add full sequences (beyond the first 64 tokens)

Domain and provenance annotation