Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
HuggingFaceTB
's Collections
Dolma LongAttn Graded
Reasoning datasets
SmolLM2
SmolVLM2 πΊ Smallest video LM ever π€π»
π LLM pretraining datasets
SmolVLM
π§© SmolLM2 Intermediate Checkpoints
The Ultimate Collection of Code Classifiers
SmolVLM 256M & 500M
π FineMath
π» Local SmolLMs
πͺ SmolLM
Instruct datasets
π Cosmopedia
Find textbooks in FineWeb with a classifier
FineWeb clustering & synthetic generations
Other: Stanford, OpenStax, khanAcademy, wikihow...
FW generation prompts
Wikipedia Science topics
Wikipedia textbooks
SFT Experiments
Decay mixture experiments
models
π Cosmopedia
updated
May 5
Resources for Cosmopedia dataset
Upvote
9
HuggingFaceTB/cosmopedia
Viewer
β’
Updated
Aug 12, 2024
β’
31.1M
β’
7.1k
β’
616
HuggingFaceTB/cosmo-1b
Text Generation
β’
Updated
Jul 8, 2024
β’
492
β’
131
Running
6
6
Web clusters
πΈ
Browse and explore clustered web samples by educational value
HuggingFaceTB/cosmopedia-100k
Viewer
β’
Updated
Feb 19, 2024
β’
100k
β’
480
β’
43
HuggingFaceTB/cosmopedia-meta
Viewer
β’
Updated
Feb 20, 2024
β’
31.1M
β’
34
β’
2
HuggingFaceTB/smollm-corpus
Viewer
β’
Updated
Sep 6, 2024
β’
237M
β’
11.3k
β’
334
Upvote
9
+5
Share collection
View history
Collection guide
Browse collections