Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Data Provenance Initiative

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

conceptofmind  authored a paper 12 days ago
Bridging the Data Provenance Gap Across Text, Speech and Video
conceptofmind  authored a paper 12 days ago
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Muennighoff  authored a paper about 2 months ago
Crosslingual Reasoning through Test-Time Scaling
View all activity

Niklas Muennighoff's profile picture Damien Sileo's profile picture David Mataciunas's profile picture Shayne Longpre's profile picture Ahmad Mustafa Anis's profile picture Enrico Shippole's profile picture Kartik Perisetla's profile picture Robert Mahari's profile picture Minnie Liang's profile picture

models 0

None public yet

datasets 25

DataProvenanceInitiative/common_pile_set

Viewer • Updated Mar 27 • 4.79M • 60

DataProvenanceInitiative/Megawika_corrected

Viewer • Updated Dec 15, 2024 • 556k • 411

DataProvenanceInitiative/stack-exchange-instruction-2split

Viewer • Updated Dec 8, 2024 • 10.8M • 179

DataProvenanceInitiative/Megawika_subset

Updated Nov 19, 2024 • 63

DataProvenanceInitiative/common_pile_ultra_permissive

Viewer • Updated Sep 9, 2024 • 7.05M • 174

DataProvenanceInitiative/Commercial_or_unspecified_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 61M • 113

DataProvenanceInitiative/commercial_or_unspecified_licenses

Viewer • Updated Sep 9, 2024 • 74.6M • 230

DataProvenanceInitiative/commercial_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 25.2M • 359

DataProvenanceInitiative/commercial_licenses

Viewer • Updated Sep 9, 2024 • 35M • 307 • 2

DataProvenanceInitiative/Everything

Viewer • Updated Sep 9, 2024 • 44.5M • 602 • 1
View 25 datasets
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs