The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 59
Android Models Collection LiteRT models that can run on Android • 20 items • Updated 21 days ago • 152
Lapa v0.1.2 Release Collection Release of SOTA Ukrainian LLM and Datasets • 18 items • Updated Nov 13, 2025 • 24
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Paper • 2005.11401 • Published May 22, 2020 • 14
view changelog Changelog Introducing HF Jobs: Run scalable compute jobs on Hugging Face Jul 30, 2025 • 200
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only Paper • 2306.01116 • Published Jun 1, 2023 • 41