HuggingFaceFW/fineweb-edu
Viewer
•
Updated
•
3.5B
•
85.1k
•
725
datasets used in SmolLM3 pretraining
Note Stage 1 datasets: 85% Web, 12% Code, 3% Math
Note Stage2 new datasets
Note Stage 3 (decay) new datasets