🤓Small-Datasets Collection Multi-stage high-quality datasets makes the model more helpful! • 7 items • Updated 17 days ago • 3
🧠 Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community • 23 items • Updated 1 day ago • 136
SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated Feb 20 • 52
Magpie Reasoning Datasets Collection Reasoning datasets built by Magpie and its friends! • 8 items • Updated Jan 27 • 10
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 131
GPT-4 generated datasets Collection Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 10
A little guide to building Large Language Models in 2024 Collection Resources mentioned by @thomwolf in https://x.com/Thom_Wolf/status/1773340316835131757 • 19 items • Updated Apr 1, 2024 • 15
🔍 Interpretability & Analysis of LMs Collection Outstanding research in LM interpretability and evaluation, summarized • 109 items • Updated 10 days ago • 100