arxiv:2501.08365
Sebastian Majstorovic PRO
storytracer
AI & ML interests
Open Data Specialist @EleutherAI
Recent Activity
upvoted
an
article
1 day ago
Open-R1: a fully open reproduction of DeepSeek-R1
liked
a dataset
2 days ago
speechbrain/LargeScaleASR
authored
a paper
14 days ago
Towards Best Practices for Open Datasets for LLM Training
Organizations
Papers
1
models
None public yet
datasets
9
storytracer/usgpo
Viewer
•
Updated
•
3.75M
•
11
storytracer/public_library_1929_dolma
Viewer
•
Updated
•
9.08k
•
40
storytracer/hathi_full_20240501
Viewer
•
Updated
•
18.4M
•
40
storytracer/hathi_pd_books_us_ia_2024-05-01
Viewer
•
Updated
•
225k
•
41
storytracer/openlibrary_dump_2024-04-30
Preview
•
Updated
•
69
storytracer/loc_books_dolma
Viewer
•
Updated
•
14.1k
•
73
storytracer/German-PD-Newspapers
Viewer
•
Updated
•
5.38M
•
278
•
4
storytracer/LoC-PD-Books
Viewer
•
Updated
•
16.5k
•
660
•
28
storytracer/US-PD-Books
Viewer
•
Updated
•
654k
•
283
•
181