nielsr HF Staff commited on
Commit
58d254d
·
verified ·
1 Parent(s): 323a419

Improve model card with metadata and description

Browse files

This PR improves the model card by:

- Adding the `pipeline_tag: text-classification` to better reflect the model's purpose.
- Specifying the `library_name: fasttext` as the model uses fastText for filtering.
- Confirming the `license: mit`.
- Providing a more detailed description of the model and its usage.
- Adding a link to the Github repository.

This will improve discoverability and usability of this valuable data filtering resource.

Files changed (1) hide show
  1. README.md +6 -3
README.md CHANGED
@@ -1,6 +1,9 @@
1
  ---
2
  license: mit
 
 
3
  ---
4
- This is the fastText pretraining data filter targeting
5
- the LAMBADA FR task, discussed in the main text of the Perplexity
6
- Correlations paper: https://arxiv.org/abs/2409.05816
 
 
1
  ---
2
  license: mit
3
+ library_name: fasttext
4
+ pipeline_tag: text-classification
5
  ---
6
+
7
+ This is the fastText pretraining data filter targeting the LAMBADA FR task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816. This filter uses perplexity correlations to identify high-quality pretraining data without requiring any LLM training. It is designed to be used with the `fastText` library.
8
+
9
+ Github: https://github.com/TristanThrush/perplexity-correlations