Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Data Selection

community
https://github.com/microsoft/LMOps/tree/main/data_selection
Activity Feed

AI & ML interests

Data Selection for Language Models

Yuxian Gu's profile picture

Data-Selection 's collections 2

Baseline Models
Models Trained on Redpajama-CC
  • Data-Selection/BSL-160M

    Text Generation • Updated Oct 28, 2024 • 83
  • Data-Selection/BSL-470M

    Text Generation • Updated Oct 28, 2024 • 5
  • Data-Selection/BSL-1.7B

    Text Generation • Updated Mar 25 • 4
  • Data-Selection/BSL-1B

    Text Generation • Updated Oct 28, 2024 • 5
PDS-Models
Models trained on PDS-Selected Data
  • Data-Selection/PDS-160M

    Text Generation • 0.2B • Updated Mar 25 • 8
  • Data-Selection/PDS-470M

    Text Generation • 0.5B • Updated Mar 25 • 7
  • Data-Selection/PDS-1B

    Text Generation • Updated Mar 25 • 4
  • Data-Selection/PDS-1.7B

    Text Generation • Updated Mar 25 • 6
Baseline Models
Models Trained on Redpajama-CC
  • Data-Selection/BSL-160M

    Text Generation • Updated Oct 28, 2024 • 83
  • Data-Selection/BSL-470M

    Text Generation • Updated Oct 28, 2024 • 5
  • Data-Selection/BSL-1.7B

    Text Generation • Updated Mar 25 • 4
  • Data-Selection/BSL-1B

    Text Generation • Updated Oct 28, 2024 • 5
PDS-Models
Models trained on PDS-Selected Data
  • Data-Selection/PDS-160M

    Text Generation • 0.2B • Updated Mar 25 • 8
  • Data-Selection/PDS-470M

    Text Generation • 0.5B • Updated Mar 25 • 7
  • Data-Selection/PDS-1B

    Text Generation • Updated Mar 25 • 4
  • Data-Selection/PDS-1.7B

    Text Generation • Updated Mar 25 • 6
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs