State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M
Hugging Face TB Research
Enterprise
community
AI & ML interests
Exploring synthetic datasets, generated by Large Language Models (TB is for Textbook, as inspired by the "Textbooks are all your need" paper)
Organization Card
HuggingFaceTB
This is the home for small LLMs (SmolLM) and high quality pre-training datasets, such as Cosmopedia and Smollm-Corpus.
We released:
- Cosmopedia: the largest open synthetic dataset, with 25B tokens and more than 30M samples. It contains synthetic textbooks, blog posts, stories, posts, and WikiHow articles generated by Mixtral-8x7B-Instruct-v0.1.
- Cosmo-1B a 1B model trained on Cosmopedia.
- FineWeb-Edu: a filtered version of FineWeb dataset for educational content
- Smollm-Corpus: the pre-training corpus of SmolLM models including Cosmopedia v0.2, FineWeb-Edu and Python-Edu.
- SmolLM models and SmolLM2: a series of strong small models in three sizes: 135M, 360M and 1.7B
For more details check our blog posts: https://huggingface.co/blog/cosmopedia and https://huggingface.co/blog/smollm
models
26
HuggingFaceTB/SmolLM2-1.7B-Instruct
Text Generation
•
Updated
•
61.1k
•
335
HuggingFaceTB/SmolLM2-1.7B-Instruct-GGUF
Text Generation
•
Updated
•
4.65k
•
28
HuggingFaceTB/SmolLM2-135M-Instruct
Text Generation
•
Updated
•
14.7k
•
56
HuggingFaceTB/SmolLM2-360M
Text Generation
•
Updated
•
5.24k
•
22
HuggingFaceTB/SmolLM2-360M-Instruct
Text Generation
•
Updated
•
22.1k
•
44
HuggingFaceTB/SmolLM2-1.7B
Text Generation
•
Updated
•
9.79k
•
67
HuggingFaceTB/SmolLM2-135M
Text Generation
•
Updated
•
12.2k
•
27
HuggingFaceTB/SmolLM2-360M-Instruct-GGUF
Updated
•
1.08k
•
12
HuggingFaceTB/SmolLM-1.7B
Text Generation
•
Updated
•
9.55k
•
161
HuggingFaceTB/SmolLM-135M-Instruct
Text Generation
•
Updated
•
21.2k
•
98
datasets
29
HuggingFaceTB/MATH
Updated
•
144
•
2
HuggingFaceTB/smollm-corpus
Viewer
•
Updated
•
237M
•
12.8k
•
242
HuggingFaceTB/everyday-conversations-llama3.1-2k
Viewer
•
Updated
•
2.38k
•
622
•
77
HuggingFaceTB/instruct-data-basics-smollm-H4
Viewer
•
Updated
•
767
•
151
HuggingFaceTB/self-oss-instruct-sc2-H4
Viewer
•
Updated
•
50.7k
•
374
•
1
HuggingFaceTB/Magpie-Pro-300K-Filtered-H4
Viewer
•
Updated
•
300k
•
164
•
2
HuggingFaceTB/OpenHermes-2.5-H4
Viewer
•
Updated
•
1M
•
185
•
2
HuggingFaceTB/bisac_expanded_topics
Viewer
•
Updated
•
34.2k
•
38
HuggingFaceTB/cosmopedia
Viewer
•
Updated
•
31.1M
•
12.1k
•
563
HuggingFaceTB/python-edu-annotations
Viewer
•
Updated
•
491k
•
59
•
2