kenhktsui commited on
Commit
d87b3ea
·
verified ·
1 Parent(s): c2ba415

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -1
README.md CHANGED
@@ -14,7 +14,7 @@ library_name: fasttext
14
  [Dataset](https://huggingface.co/datasets/kenhktsui/code-natural-language-classification-dataset)
15
 
16
  This classifier classifies a text into Code or NaturalLanguage.
17
- The model is trained over 3.24M records, which is a mix of code and natural langauge and achieved a test F1 score of 0.99.
18
  The classifier can be used for LLM pretraining data curation, to route a text into different pipeline (e.g. code syntax check).
19
  It is ultra fast ⚡ with a throughtput of ~2000 doc/s with CPU.
20
 
@@ -50,6 +50,17 @@ predict([
50
  # {'label': 'Code', 'score': 1.00001},
51
  # {'label': 'Code', 'score': 1.000009}]
52
  ```
 
 
 
 
 
 
 
 
 
 
 
53
 
54
 
55
  ## 📝Definition of Label
 
14
  [Dataset](https://huggingface.co/datasets/kenhktsui/code-natural-language-classification-dataset)
15
 
16
  This classifier classifies a text into Code or NaturalLanguage.
17
+ The model is trained over 3.24M records, which is a mix of code and natural langauge and achieved a test F1 score of 0.97.
18
  The classifier can be used for LLM pretraining data curation, to route a text into different pipeline (e.g. code syntax check).
19
  It is ultra fast ⚡ with a throughtput of ~2000 doc/s with CPU.
20
 
 
50
  # {'label': 'Code', 'score': 1.00001},
51
  # {'label': 'Code', 'score': 1.000009}]
52
  ```
53
+ ## 📊Evaluation
54
+ ```
55
+ precision recall f1-score support
56
+
57
+ Code 0.97 1.00 0.98 581282
58
+ NaturalLanguage 1.00 0.92 0.95 228993
59
+
60
+ accuracy 0.98 810275
61
+ macro avg 0.98 0.96 0.97 810275
62
+ weighted avg 0.98 0.98 0.98 810275
63
+ ```
64
 
65
 
66
  ## 📝Definition of Label