Improve model card with metadata and description
Browse filesThis PR improves the model card by:
- Adding the `pipeline_tag: text-classification` to better reflect the model's purpose.
- Specifying the `library_name: fasttext` as the model uses fastText for filtering.
- Confirming the `license: mit`.
- Providing a more detailed description of the model and its usage.
- Adding a link to the Github repository.
This will improve discoverability and usability of this valuable data filtering resource.
README.md
CHANGED
@@ -1,6 +1,9 @@
|
|
1 |
---
|
2 |
license: mit
|
|
|
|
|
3 |
---
|
4 |
-
|
5 |
-
the LAMBADA FR task, discussed in the main text of the Perplexity
|
6 |
-
|
|
|
|
1 |
---
|
2 |
license: mit
|
3 |
+
library_name: fasttext
|
4 |
+
pipeline_tag: text-classification
|
5 |
---
|
6 |
+
|
7 |
+
This is the fastText pretraining data filter targeting the LAMBADA FR task, discussed in the main text of the Perplexity Correlations paper: https://arxiv.org/abs/2409.05816. This filter uses perplexity correlations to identify high-quality pretraining data without requiring any LLM training. It is designed to be used with the `fastText` library.
|
8 |
+
|
9 |
+
Github: https://github.com/TristanThrush/perplexity-correlations
|