Update README.md
Browse files
README.md
CHANGED
@@ -18,7 +18,8 @@ This exists for comparison to [BEE-spoke-data/wordpiece-tokenizer-32k-en_code-ms
|
|
18 |
|
19 |
## comparison vs BERT/mpnet tokenizer
|
20 |
|
21 |
-
|
|
|
22 |
|
23 |
Total tokens in base tokenizer: 30527
|
24 |
Total tokens in retrained tokenizer: 31999
|
|
|
18 |
|
19 |
## comparison vs BERT/mpnet tokenizer
|
20 |
|
21 |
+
> [!NOTE]
|
22 |
+
> `bert-base-uncased`'s tokenizer is the 'base tokenizer' in the below
|
23 |
|
24 |
Total tokens in base tokenizer: 30527
|
25 |
Total tokens in retrained tokenizer: 31999
|