Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuggingFaceTB
's Collections
🧠 SmolLM3
SmolLM3 pretraining datasets
SmolLM3 evaluation datasets
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 📺 Smallest video LM ever 🤏🏻
📚 LLM pretraining datasets
SmolVLM
🧩 SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
📐 FineMath
💻 Local SmolLMs
🪐 SmolLM
Instruct datasets
🌌 Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
models
SmolLM3 pretraining datasets
updated
10 days ago
datasets used in SmolLM3 pretraining
Upvote
20
+10
HuggingFaceFW/fineweb-edu
Viewer
•
Updated
7 days ago
•
3.5B
•
159k
•
718
mlfoundations/dclm-baseline-1.0
Preview
•
Updated
Jul 22, 2024
•
116k
•
225
epfml/FineWeb2-HQ
Viewer
•
Updated
Feb 19
•
380M
•
13.6k
•
17
HuggingFaceTB/finemath
Viewer
•
Updated
Feb 6
•
48.3M
•
20.1k
•
329
bigcode/the-stack-v2
Viewer
•
Updated
Apr 23, 2024
•
5.45B
•
3.36k
•
387
HuggingFaceTB/issues-kaggle-notebooks
Viewer
•
Updated
Mar 19
•
16.1M
•
355
•
10
LLM360/MegaMath
Viewer
•
Updated
Apr 9
•
217M
•
41k
•
96
HuggingFaceTB/stack-edu
Viewer
•
Updated
Mar 20
•
167M
•
1.83k
•
43
Note
Stage2 new datasets
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
Sep 6, 2024
•
237M
•
23k
•
349
allenai/dolmino-mix-1124
Viewer
•
Updated
Dec 17, 2024
•
165M
•
17.2k
•
66
nvidia/OpenMathReasoning
Viewer
•
Updated
May 27
•
5.68M
•
15.7k
•
308
nvidia/OpenCodeReasoning
Viewer
•
Updated
May 4
•
753k
•
3.53k
•
481
facebook/natural_reasoning
Viewer
•
Updated
Feb 21
•
1.15M
•
1.13k
•
511
Note
Stage 3 (decay) new datasets
Upvote
20
+16
Share collection
View history
Collection guide
Browse collections