Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up

Data Selection

community
https://github.com/microsoft/LMOps/tree/main/data_selection
Activity Feed

AI & ML interests

Data Selection for Language Models

Yuxian Gu's profile picture

Collections 2

Baseline Models
Models Trained on Redpajama-CC
  • Data-Selection/BSL-160M

    Text Generation • Updated Oct 28, 2024 • 21
  • Data-Selection/BSL-470M

    Text Generation • Updated Oct 28, 2024 • 15
  • Data-Selection/BSL-1.7B

    Text Generation • Updated Mar 25 • 21
  • Data-Selection/BSL-1B

    Text Generation • Updated Oct 28, 2024 • 15
PDS-Models
Models trained on PDS-Selected Data
  • Data-Selection/PDS-160M

    Text Generation • Updated Mar 25 • 17
  • Data-Selection/PDS-470M

    Text Generation • Updated Mar 25 • 21
  • Data-Selection/PDS-1B

    Text Generation • Updated Mar 25 • 14
  • Data-Selection/PDS-1.7B

    Text Generation • Updated Mar 25 • 22

models 9

Data-Selection/PDS-470M

Text Generation • Updated Mar 25 • 21

Data-Selection/PDS-160M

Text Generation • Updated Mar 25 • 17

Data-Selection/PDS-1B

Text Generation • Updated Mar 25 • 14

Data-Selection/PDS-1.7B

Text Generation • Updated Mar 25 • 22

Data-Selection/BSL-1.7B

Text Generation • Updated Mar 25 • 21

Data-Selection/data_scorer

Updated Jan 5 • 3

Data-Selection/BSL-1B

Text Generation • Updated Oct 28, 2024 • 15

Data-Selection/BSL-470M

Text Generation • Updated Oct 28, 2024 • 15

Data-Selection/BSL-160M

Text Generation • Updated Oct 28, 2024 • 21

datasets 0

None public yet
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs