Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
rain2sun
's Collections
NLP
RL-Datasets
Distilled
Math-Code-Reason
Code-IFT-Datasets
Open-LLM
High-Quality-Datasets
Pretrain-Datasets
IFT-Datasets
High-Quality-Datasets
updated
Dec 2, 2024
高质量数据集,包含高密度的知识
Upvote
-
wikimedia/wikipedia
Viewer
•
Updated
Jan 9, 2024
•
61.6M
•
41.4k
•
898
OpenCoder-LLM/opc-annealing-corpus
Viewer
•
Updated
May 29
•
15.6M
•
1.32k
•
37
hltcoe/megawika
Updated
Jan 31
•
19.5k
•
40
allenai/dolmino-mix-1124
Viewer
•
Updated
Dec 17, 2024
•
165M
•
23.2k
•
69
Upvote
-
Share collection
View history
Collection guide
Browse collections