Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
common-pile 's Collections
Common Pile v0.1
Common Pile v0.1 Raw Data
Common Pile v0.1 Filtered Data
Comma v0.1 Artifacts

Common Pile v0.1

updated 3 days ago

All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text

Upvote
21

  • Common Pile v0.1 Raw Data

    Collection
    8TB of public domain and openly licensed text • 30 items • Updated 3 days ago • 11

  • Common Pile v0.1 Filtered Data

    Collection
    An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 3 days ago • 10

  • Comma v0.1 Artifacts

    Collection
    A collection of artifacts related to Comma v0.1—a 7B parameter LLM trained on public domain and openly licensed text • 3 items • Updated 3 days ago • 4

  • The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text

    Paper • 2506.05209 • Published 4 days ago • 29
Upvote
21
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs