kenhktsui
/

code-natural-language-fasttext-classifier

Text Classification

Model card Files Files and versions Community

kenhktsui commited on Oct 30, 2024

Commit

d87b3ea

·

verified ·

1 Parent(s): c2ba415

Update README.md

Files changed (1) hide show

README.md +12 -1

README.md CHANGED Viewed

@@ -14,7 +14,7 @@ library_name: fasttext
 [Dataset](https://huggingface.co/datasets/kenhktsui/code-natural-language-classification-dataset)
 This classifier classifies a text into Code or NaturalLanguage.
-The model is trained over 3.24M records, which is a mix of code and natural langauge and achieved a test F1 score of 0.99.
 The classifier can be used for LLM pretraining data curation, to route a text into different pipeline (e.g. code syntax check).
 It is ultra fast ⚡ with a throughtput of ~2000 doc/s with CPU.
@@ -50,6 +50,17 @@ predict([
 # {'label': 'Code', 'score': 1.00001},
 # {'label': 'Code', 'score': 1.000009}]
 ```
 ## 📝Definition of Label

 [Dataset](https://huggingface.co/datasets/kenhktsui/code-natural-language-classification-dataset)
 This classifier classifies a text into Code or NaturalLanguage.
+The model is trained over 3.24M records, which is a mix of code and natural langauge and achieved a test F1 score of 0.97.
 The classifier can be used for LLM pretraining data curation, to route a text into different pipeline (e.g. code syntax check).
 It is ultra fast ⚡ with a throughtput of ~2000 doc/s with CPU.
 # {'label': 'Code', 'score': 1.00001},
 # {'label': 'Code', 'score': 1.000009}]
 ```
+## 📊Evaluation
+```
+                 precision    recall  f1-score   support
+           Code       0.97      1.00      0.98    581282
+NaturalLanguage       1.00      0.92      0.95    228993
+       accuracy                           0.98    810275
+      macro avg       0.98      0.96      0.97    810275
+   weighted avg       0.98      0.98      0.98    810275
+```
 ## 📝Definition of Label