jakelever commited on
Commit
91a5de2
·
verified ·
1 Parent(s): 26a753e

Upload folder using huggingface_hub

Browse files
.ipynb_checkpoints/README-checkpoint.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ task: sequence-classification
3
+ tags:
4
+ - biomedical
5
+ - bionlp
6
+ - relation extraction
7
+ license: mit
8
+ base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
9
+ ---
10
+
11
+ # synthetic_relex model for biomedical relation extraction
12
+
13
+ This is a relation extraction model that is distilled from Llama 3.3 70B down to a BERT model. It is a [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model that has been fine-tuned on synthetic labels created with Llama 3.3 70B when prompted with sentences from [PubTator Central](https://www.ncbi.nlm.nih.gov/research/pubtator3/). The dataset is available [here](https://huggingface.co/datasets/Glasgow-AI4BioMed/synthetic_relex).
14
+
15
+ **Note:** No humans were involved in annotating the dataset used, so there may be erroneous annotations. Detailed evaluation by human experts would be needed to gain an accurate view of the model's accuracy. The dataset and model offer a starting point for understanding and development of biomedical relation extraction models.
16
+
17
+ More information about the model and dataset can be found at the project repo: https://github.com/Glasgow-AI4BioMed/synthetic_relex
18
+
19
+ ## 🚀 Example Usage
20
+
21
+ The model can classify the relationship between two entities into one of X labels. The labels are:
22
+
23
+ To use the model, take the input text and wrap the first entity in [E1][/E1] tags and second entity in [E2][/E2] tags as in the example below. The classifier then outputs the predicted relation label with an associated score.
24
+
25
+ ```python
26
+ from transformers import pipeline
27
+
28
+ classifier = pipeline("text-classification", model="Glasgow-AI4BioMed/synthetic_relex")
29
+
30
+ classifier("[E1]Paclitaxel[/E1] is a common chemotherapy used for [E2]lung cancer[/E2].")
31
+
32
+ # Output:
33
+ # [{'label': 'treats', 'score': 0.9868311882019043}]
34
+ ```
35
+
36
+ ## 📈 Performance
37
+
38
+ | Label | Precision | Recall | F1-score | Support |
39
+ | --- | --- | --- | --- | --- |
40
+ | affects_efficacy_of | 0.473 | 0.296 | 0.364 | 1127 |
41
+ | binds_to | 0.541 | 0.266 | 0.357 | 492 |
42
+ | biomarker_for | 0.455 | 0.621 | 0.525 | 314 |
43
+ | causes | 0.667 | 0.571 | 0.615 | 3400 |
44
+ | co_expressed_with | 0.440 | 0.473 | 0.456 | 131 |
45
+ | downregulates | 0.472 | 0.481 | 0.477 | 106 |
46
+ | inhibits | 0.460 | 0.251 | 0.324 | 1429 |
47
+ | interacts_with | 0.469 | 0.310 | 0.373 | 1588 |
48
+ | none | 0.936 | 0.961 | 0.948 | 76442 |
49
+ | plays_causal_role_in | 0.343 | 0.426 | 0.380 | 202 |
50
+ | precursor_of | 0.462 | 0.212 | 0.291 | 113 |
51
+ | prevents | 0.602 | 0.504 | 0.548 | 135 |
52
+ | regulates | 0.504 | 0.509 | 0.506 | 116 |
53
+ | subtype_of | 0.382 | 0.521 | 0.441 | 286 |
54
+ | treats | 0.630 | 0.702 | 0.664 | 1000 |
55
+ | upregulates | 0.564 | 0.549 | 0.557 | 224 |
56
+ | **macro avg** | **0.525** | **0.478** | **0.489** | **87105** |
57
+ | weighted avg | 0.889 | 0.898 | 0.892 | 87105 |
.ipynb_checkpoints/performance_reports-checkpoint.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance on Training Set
2
+
3
+ | Label | Precision | Recall | F1-score | Support |
4
+ | --- | --- | --- | --- | --- |
5
+ | affects_efficacy_of | 0.874 | 0.658 | 0.751 | 3282 |
6
+ | binds_to | 0.888 | 0.757 | 0.817 | 967 |
7
+ | biomarker_for | 0.760 | 0.918 | 0.831 | 1081 |
8
+ | causes | 0.913 | 0.842 | 0.876 | 10861 |
9
+ | co_expressed_with | 0.921 | 0.843 | 0.880 | 605 |
10
+ | downregulates | 0.888 | 0.905 | 0.896 | 744 |
11
+ | inhibits | 0.903 | 0.691 | 0.783 | 4565 |
12
+ | interacts_with | 0.883 | 0.785 | 0.831 | 3495 |
13
+ | none | 0.976 | 0.988 | 0.982 | 217028 |
14
+ | plays_causal_role_in | 0.739 | 0.833 | 0.783 | 926 |
15
+ | precursor_of | 0.926 | 0.763 | 0.836 | 379 |
16
+ | prevents | 0.842 | 0.835 | 0.838 | 423 |
17
+ | regulates | 0.784 | 0.837 | 0.809 | 454 |
18
+ | subtype_of | 0.697 | 0.839 | 0.761 | 939 |
19
+ | treats | 0.871 | 0.919 | 0.895 | 2592 |
20
+ | upregulates | 0.845 | 0.900 | 0.871 | 1077 |
21
+ | macro avg | 0.857 | 0.832 | 0.840 | 249418 |
22
+ | weighted avg | 0.964 | 0.964 | 0.963 | 249418 |
23
+
24
+ # Performance on Validation Set
25
+
26
+ | Label | Precision | Recall | F1-score | Support |
27
+ | --- | --- | --- | --- | --- |
28
+ | affects_efficacy_of | 0.466 | 0.252 | 0.327 | 994 |
29
+ | binds_to | 0.568 | 0.303 | 0.395 | 317 |
30
+ | biomarker_for | 0.476 | 0.626 | 0.541 | 441 |
31
+ | causes | 0.659 | 0.542 | 0.595 | 3513 |
32
+ | co_expressed_with | 0.574 | 0.251 | 0.350 | 370 |
33
+ | downregulates | 0.658 | 0.571 | 0.612 | 371 |
34
+ | inhibits | 0.502 | 0.319 | 0.390 | 1127 |
35
+ | interacts_with | 0.550 | 0.366 | 0.440 | 1385 |
36
+ | none | 0.931 | 0.960 | 0.946 | 70920 |
37
+ | plays_causal_role_in | 0.391 | 0.411 | 0.401 | 253 |
38
+ | precursor_of | 0.541 | 0.384 | 0.449 | 172 |
39
+ | prevents | 0.645 | 0.562 | 0.601 | 178 |
40
+ | regulates | 0.474 | 0.522 | 0.497 | 178 |
41
+ | subtype_of | 0.352 | 0.454 | 0.397 | 271 |
42
+ | treats | 0.650 | 0.657 | 0.654 | 866 |
43
+ | upregulates | 0.585 | 0.539 | 0.561 | 395 |
44
+ | macro avg | 0.564 | 0.483 | 0.510 | 81751 |
45
+ | weighted avg | 0.884 | 0.894 | 0.887 | 81751 |
46
+
47
+ # Performance on Test Set
48
+
49
+ | Label | Precision | Recall | F1-score | Support |
50
+ | --- | --- | --- | --- | --- |
51
+ | affects_efficacy_of | 0.473 | 0.296 | 0.364 | 1127 |
52
+ | binds_to | 0.541 | 0.266 | 0.357 | 492 |
53
+ | biomarker_for | 0.455 | 0.621 | 0.525 | 314 |
54
+ | causes | 0.667 | 0.571 | 0.615 | 3400 |
55
+ | co_expressed_with | 0.440 | 0.473 | 0.456 | 131 |
56
+ | downregulates | 0.472 | 0.481 | 0.477 | 106 |
57
+ | inhibits | 0.460 | 0.251 | 0.324 | 1429 |
58
+ | interacts_with | 0.469 | 0.310 | 0.373 | 1588 |
59
+ | none | 0.936 | 0.961 | 0.948 | 76442 |
60
+ | plays_causal_role_in | 0.343 | 0.426 | 0.380 | 202 |
61
+ | precursor_of | 0.462 | 0.212 | 0.291 | 113 |
62
+ | prevents | 0.602 | 0.504 | 0.548 | 135 |
63
+ | regulates | 0.504 | 0.509 | 0.506 | 116 |
64
+ | subtype_of | 0.382 | 0.521 | 0.441 | 286 |
65
+ | treats | 0.630 | 0.702 | 0.664 | 1000 |
66
+ | upregulates | 0.564 | 0.549 | 0.557 | 224 |
67
+ | macro avg | 0.525 | 0.478 | 0.489 | 87105 |
68
+ | weighted avg | 0.889 | 0.898 | 0.892 | 87105 |
README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ task: sequence-classification
3
+ tags:
4
+ - biomedical
5
+ - bionlp
6
+ - relation extraction
7
+ license: mit
8
+ base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
9
+ ---
10
+
11
+ # synthetic_relex model for biomedical relation extraction
12
+
13
+ This is a relation extraction model that is distilled from Llama 3.3 70B down to a BERT model. It is a [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model that has been fine-tuned on synthetic labels created with Llama 3.3 70B when prompted with sentences from [PubTator Central](https://www.ncbi.nlm.nih.gov/research/pubtator3/). The dataset is available [here](https://huggingface.co/datasets/Glasgow-AI4BioMed/synthetic_relex).
14
+
15
+ **Note:** No humans were involved in annotating the dataset used, so there may be erroneous annotations. Detailed evaluation by human experts would be needed to gain an accurate view of the model's accuracy. The dataset and model offer a starting point for understanding and development of biomedical relation extraction models.
16
+
17
+ More information about the model and dataset can be found at the project repo: https://github.com/Glasgow-AI4BioMed/synthetic_relex
18
+
19
+ ## 🚀 Example Usage
20
+
21
+ The model can classify the relationship between two entities into one of X labels. The labels are:
22
+
23
+ To use the model, take the input text and wrap the first entity in [E1][/E1] tags and second entity in [E2][/E2] tags as in the example below. The classifier then outputs the predicted relation label with an associated score.
24
+
25
+ ```python
26
+ from transformers import pipeline
27
+
28
+ classifier = pipeline("text-classification", model="Glasgow-AI4BioMed/synthetic_relex")
29
+
30
+ classifier("[E1]Paclitaxel[/E1] is a common chemotherapy used for [E2]lung cancer[/E2].")
31
+
32
+ # Output:
33
+ # [{'label': 'treats', 'score': 0.9868311882019043}]
34
+ ```
35
+
36
+ ## 📈 Performance
37
+
38
+ | Label | Precision | Recall | F1-score | Support |
39
+ | --- | --- | --- | --- | --- |
40
+ | affects_efficacy_of | 0.473 | 0.296 | 0.364 | 1127 |
41
+ | binds_to | 0.541 | 0.266 | 0.357 | 492 |
42
+ | biomarker_for | 0.455 | 0.621 | 0.525 | 314 |
43
+ | causes | 0.667 | 0.571 | 0.615 | 3400 |
44
+ | co_expressed_with | 0.440 | 0.473 | 0.456 | 131 |
45
+ | downregulates | 0.472 | 0.481 | 0.477 | 106 |
46
+ | inhibits | 0.460 | 0.251 | 0.324 | 1429 |
47
+ | interacts_with | 0.469 | 0.310 | 0.373 | 1588 |
48
+ | none | 0.936 | 0.961 | 0.948 | 76442 |
49
+ | plays_causal_role_in | 0.343 | 0.426 | 0.380 | 202 |
50
+ | precursor_of | 0.462 | 0.212 | 0.291 | 113 |
51
+ | prevents | 0.602 | 0.504 | 0.548 | 135 |
52
+ | regulates | 0.504 | 0.509 | 0.506 | 116 |
53
+ | subtype_of | 0.382 | 0.521 | 0.441 | 286 |
54
+ | treats | 0.630 | 0.702 | 0.664 | 1000 |
55
+ | upregulates | 0.564 | 0.549 | 0.557 | 224 |
56
+ | **macro avg** | **0.525** | **0.478** | **0.489** | **87105** |
57
+ | weighted avg | 0.889 | 0.898 | 0.892 | 87105 |
added_tokens.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "[/E1]": 30523,
3
+ "[/E2]": 30525,
4
+ "[E1]": 30522,
5
+ "[E2]": 30524
6
+ }
config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
3
+ "architectures": [
4
+ "BertForSequenceClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "affects_efficacy_of",
13
+ "1": "binds_to",
14
+ "2": "biomarker_for",
15
+ "3": "causes",
16
+ "4": "co_expressed_with",
17
+ "5": "downregulates",
18
+ "6": "inhibits",
19
+ "7": "interacts_with",
20
+ "8": "none",
21
+ "9": "plays_causal_role_in",
22
+ "10": "precursor_of",
23
+ "11": "prevents",
24
+ "12": "regulates",
25
+ "13": "subtype_of",
26
+ "14": "treats",
27
+ "15": "upregulates"
28
+ },
29
+ "initializer_range": 0.02,
30
+ "intermediate_size": 3072,
31
+ "layer_norm_eps": 1e-12,
32
+ "max_position_embeddings": 512,
33
+ "model_type": "bert",
34
+ "num_attention_heads": 12,
35
+ "num_hidden_layers": 12,
36
+ "pad_token_id": 0,
37
+ "position_embedding_type": "absolute",
38
+ "torch_dtype": "float32",
39
+ "transformers_version": "4.48.1",
40
+ "type_vocab_size": 2,
41
+ "use_cache": true,
42
+ "vocab_size": 30526
43
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cd0df6b3e823de3c0a5a50265cec964e78833fc2dca073239c8ce27bc3e4bcfe
3
+ size 438014000
performance_reports.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance on Training Set
2
+
3
+ | Label | Precision | Recall | F1-score | Support |
4
+ | --- | --- | --- | --- | --- |
5
+ | affects_efficacy_of | 0.874 | 0.658 | 0.751 | 3282 |
6
+ | binds_to | 0.888 | 0.757 | 0.817 | 967 |
7
+ | biomarker_for | 0.760 | 0.918 | 0.831 | 1081 |
8
+ | causes | 0.913 | 0.842 | 0.876 | 10861 |
9
+ | co_expressed_with | 0.921 | 0.843 | 0.880 | 605 |
10
+ | downregulates | 0.888 | 0.905 | 0.896 | 744 |
11
+ | inhibits | 0.903 | 0.691 | 0.783 | 4565 |
12
+ | interacts_with | 0.883 | 0.785 | 0.831 | 3495 |
13
+ | none | 0.976 | 0.988 | 0.982 | 217028 |
14
+ | plays_causal_role_in | 0.739 | 0.833 | 0.783 | 926 |
15
+ | precursor_of | 0.926 | 0.763 | 0.836 | 379 |
16
+ | prevents | 0.842 | 0.835 | 0.838 | 423 |
17
+ | regulates | 0.784 | 0.837 | 0.809 | 454 |
18
+ | subtype_of | 0.697 | 0.839 | 0.761 | 939 |
19
+ | treats | 0.871 | 0.919 | 0.895 | 2592 |
20
+ | upregulates | 0.845 | 0.900 | 0.871 | 1077 |
21
+ | macro avg | 0.857 | 0.832 | 0.840 | 249418 |
22
+ | weighted avg | 0.964 | 0.964 | 0.963 | 249418 |
23
+
24
+ # Performance on Validation Set
25
+
26
+ | Label | Precision | Recall | F1-score | Support |
27
+ | --- | --- | --- | --- | --- |
28
+ | affects_efficacy_of | 0.466 | 0.252 | 0.327 | 994 |
29
+ | binds_to | 0.568 | 0.303 | 0.395 | 317 |
30
+ | biomarker_for | 0.476 | 0.626 | 0.541 | 441 |
31
+ | causes | 0.659 | 0.542 | 0.595 | 3513 |
32
+ | co_expressed_with | 0.574 | 0.251 | 0.350 | 370 |
33
+ | downregulates | 0.658 | 0.571 | 0.612 | 371 |
34
+ | inhibits | 0.502 | 0.319 | 0.390 | 1127 |
35
+ | interacts_with | 0.550 | 0.366 | 0.440 | 1385 |
36
+ | none | 0.931 | 0.960 | 0.946 | 70920 |
37
+ | plays_causal_role_in | 0.391 | 0.411 | 0.401 | 253 |
38
+ | precursor_of | 0.541 | 0.384 | 0.449 | 172 |
39
+ | prevents | 0.645 | 0.562 | 0.601 | 178 |
40
+ | regulates | 0.474 | 0.522 | 0.497 | 178 |
41
+ | subtype_of | 0.352 | 0.454 | 0.397 | 271 |
42
+ | treats | 0.650 | 0.657 | 0.654 | 866 |
43
+ | upregulates | 0.585 | 0.539 | 0.561 | 395 |
44
+ | macro avg | 0.564 | 0.483 | 0.510 | 81751 |
45
+ | weighted avg | 0.884 | 0.894 | 0.887 | 81751 |
46
+
47
+ # Performance on Test Set
48
+
49
+ | Label | Precision | Recall | F1-score | Support |
50
+ | --- | --- | --- | --- | --- |
51
+ | affects_efficacy_of | 0.473 | 0.296 | 0.364 | 1127 |
52
+ | binds_to | 0.541 | 0.266 | 0.357 | 492 |
53
+ | biomarker_for | 0.455 | 0.621 | 0.525 | 314 |
54
+ | causes | 0.667 | 0.571 | 0.615 | 3400 |
55
+ | co_expressed_with | 0.440 | 0.473 | 0.456 | 131 |
56
+ | downregulates | 0.472 | 0.481 | 0.477 | 106 |
57
+ | inhibits | 0.460 | 0.251 | 0.324 | 1429 |
58
+ | interacts_with | 0.469 | 0.310 | 0.373 | 1588 |
59
+ | none | 0.936 | 0.961 | 0.948 | 76442 |
60
+ | plays_causal_role_in | 0.343 | 0.426 | 0.380 | 202 |
61
+ | precursor_of | 0.462 | 0.212 | 0.291 | 113 |
62
+ | prevents | 0.602 | 0.504 | 0.548 | 135 |
63
+ | regulates | 0.504 | 0.509 | 0.506 | 116 |
64
+ | subtype_of | 0.382 | 0.521 | 0.441 | 286 |
65
+ | treats | 0.630 | 0.702 | 0.664 | 1000 |
66
+ | upregulates | 0.564 | 0.549 | 0.557 | 224 |
67
+ | macro avg | 0.525 | 0.478 | 0.489 | 87105 |
68
+ | weighted avg | 0.889 | 0.898 | 0.892 | 87105 |
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,90 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "30522": {
44
+ "content": "[E1]",
45
+ "lstrip": false,
46
+ "normalized": true,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": false
50
+ },
51
+ "30523": {
52
+ "content": "[/E1]",
53
+ "lstrip": false,
54
+ "normalized": true,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": false
58
+ },
59
+ "30524": {
60
+ "content": "[E2]",
61
+ "lstrip": false,
62
+ "normalized": true,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": false
66
+ },
67
+ "30525": {
68
+ "content": "[/E2]",
69
+ "lstrip": false,
70
+ "normalized": true,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": false
74
+ }
75
+ },
76
+ "clean_up_tokenization_spaces": true,
77
+ "cls_token": "[CLS]",
78
+ "do_basic_tokenize": true,
79
+ "do_lower_case": true,
80
+ "extra_special_tokens": {},
81
+ "mask_token": "[MASK]",
82
+ "model_max_length": 512,
83
+ "never_split": null,
84
+ "pad_token": "[PAD]",
85
+ "sep_token": "[SEP]",
86
+ "strip_accents": null,
87
+ "tokenize_chinese_chars": true,
88
+ "tokenizer_class": "BertTokenizer",
89
+ "unk_token": "[UNK]"
90
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff