Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Data Provenance Initiative

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

sileod  authored a paper 7 days ago
Humanity's Last Exam
conceptofmind  authored a paper 3 months ago
Bridging the Data Provenance Gap Across Text, Speech and Video
conceptofmind  authored a paper 3 months ago
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
View all activity

Niklas Muennighoff's profile picture Damien Sileo's profile picture David Mataciunas's profile picture Shayne Longpre's profile picture Ahmad Mustafa Anis's profile picture Enrico Shippole's profile picture Kartik Perisetla's profile picture Robert Mahari's profile picture Minnie Liang's profile picture

models 0

None public yet

datasets 25

DataProvenanceInitiative/common_pile_set

Viewer • Updated Mar 27 • 4.79M • 24

DataProvenanceInitiative/Megawika_corrected

Viewer • Updated Dec 15, 2024 • 556k • 254

DataProvenanceInitiative/stack-exchange-instruction-2split

Viewer • Updated Dec 8, 2024 • 10.8M • 54

DataProvenanceInitiative/Megawika_subset

Updated Nov 19, 2024 • 17

DataProvenanceInitiative/common_pile_ultra_permissive

Viewer • Updated Sep 9, 2024 • 7.05M • 16

DataProvenanceInitiative/Commercial_or_unspecified_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 61M • 43

DataProvenanceInitiative/commercial_or_unspecified_licenses

Viewer • Updated Sep 9, 2024 • 74.6M • 51

DataProvenanceInitiative/commercial_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 25.2M • 89

DataProvenanceInitiative/commercial_licenses

Viewer • Updated Sep 9, 2024 • 35M • 68 • 2

DataProvenanceInitiative/Everything

Viewer • Updated Sep 9, 2024 • 44.5M • 294 • 1
View 25 datasets
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs