Upload folder using huggingface_hub
Browse files- 1_Pooling/config.json +7 -0
- README.md +124 -1
- config.json +24 -0
- config_sentence_transformers.json +7 -0
- eval/similarity_evaluation_results.csv +46 -0
- model.safetensors +3 -0
- modules.json +14 -0
- sentence_bert_config.json +4 -0
- special_tokens_map.json +51 -0
- tokenizer.json +0 -0
- tokenizer_config.json +65 -0
- vocab.txt +0 -0
1_Pooling/config.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"word_embedding_dimension": 768,
|
3 |
+
"pooling_mode_cls_token": false,
|
4 |
+
"pooling_mode_mean_tokens": true,
|
5 |
+
"pooling_mode_max_tokens": false,
|
6 |
+
"pooling_mode_mean_sqrt_len_tokens": false
|
7 |
+
}
|
README.md
CHANGED
@@ -1,3 +1,126 @@
|
|
1 |
---
|
2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
pipeline_tag: sentence-similarity
|
3 |
+
tags:
|
4 |
+
- sentence-transformers
|
5 |
+
- feature-extraction
|
6 |
+
- sentence-similarity
|
7 |
+
- transformers
|
8 |
+
|
9 |
---
|
10 |
+
|
11 |
+
# {MODEL_NAME}
|
12 |
+
|
13 |
+
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
+
|
15 |
+
<!--- Describe your model here -->
|
16 |
+
|
17 |
+
## Usage (Sentence-Transformers)
|
18 |
+
|
19 |
+
Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed:
|
20 |
+
|
21 |
+
```
|
22 |
+
pip install -U sentence-transformers
|
23 |
+
```
|
24 |
+
|
25 |
+
Then you can use the model like this:
|
26 |
+
|
27 |
+
```python
|
28 |
+
from sentence_transformers import SentenceTransformer
|
29 |
+
sentences = ["This is an example sentence", "Each sentence is converted"]
|
30 |
+
|
31 |
+
model = SentenceTransformer('{MODEL_NAME}')
|
32 |
+
embeddings = model.encode(sentences)
|
33 |
+
print(embeddings)
|
34 |
+
```
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
## Usage (HuggingFace Transformers)
|
39 |
+
Without [sentence-transformers](https://www.SBERT.net), you can use the model like this: First, you pass your input through the transformer model, then you have to apply the right pooling-operation on-top of the contextualized word embeddings.
|
40 |
+
|
41 |
+
```python
|
42 |
+
from transformers import AutoTokenizer, AutoModel
|
43 |
+
import torch
|
44 |
+
|
45 |
+
|
46 |
+
#Mean Pooling - Take attention mask into account for correct averaging
|
47 |
+
def mean_pooling(model_output, attention_mask):
|
48 |
+
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
|
49 |
+
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
|
50 |
+
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
|
51 |
+
|
52 |
+
|
53 |
+
# Sentences we want sentence embeddings for
|
54 |
+
sentences = ['This is an example sentence', 'Each sentence is converted']
|
55 |
+
|
56 |
+
# Load model from HuggingFace Hub
|
57 |
+
tokenizer = AutoTokenizer.from_pretrained('{MODEL_NAME}')
|
58 |
+
model = AutoModel.from_pretrained('{MODEL_NAME}')
|
59 |
+
|
60 |
+
# Tokenize sentences
|
61 |
+
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
|
62 |
+
|
63 |
+
# Compute token embeddings
|
64 |
+
with torch.no_grad():
|
65 |
+
model_output = model(**encoded_input)
|
66 |
+
|
67 |
+
# Perform pooling. In this case, mean pooling.
|
68 |
+
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
|
69 |
+
|
70 |
+
print("Sentence embeddings:")
|
71 |
+
print(sentence_embeddings)
|
72 |
+
```
|
73 |
+
|
74 |
+
|
75 |
+
|
76 |
+
## Evaluation Results
|
77 |
+
|
78 |
+
<!--- Describe how your model was evaluated -->
|
79 |
+
|
80 |
+
For an automated evaluation of this model, see the *Sentence Embeddings Benchmark*: [https://seb.sbert.net](https://seb.sbert.net?model_name={MODEL_NAME})
|
81 |
+
|
82 |
+
|
83 |
+
## Training
|
84 |
+
The model was trained with the parameters:
|
85 |
+
|
86 |
+
**DataLoader**:
|
87 |
+
|
88 |
+
`torch.utils.data.dataloader.DataLoader` of length 16082 with parameters:
|
89 |
+
```
|
90 |
+
{'batch_size': 24, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
91 |
+
```
|
92 |
+
|
93 |
+
**Loss**:
|
94 |
+
|
95 |
+
`sentence_transformers.losses.CosineSimilarityLoss.CosineSimilarityLoss`
|
96 |
+
|
97 |
+
Parameters of the fit()-Method:
|
98 |
+
```
|
99 |
+
{
|
100 |
+
"epochs": 5,
|
101 |
+
"evaluation_steps": 2000,
|
102 |
+
"evaluator": "sentence_transformers.evaluation.EmbeddingSimilarityEvaluator.EmbeddingSimilarityEvaluator",
|
103 |
+
"max_grad_norm": 1,
|
104 |
+
"optimizer_class": "<class 'torch.optim.adamw.AdamW'>",
|
105 |
+
"optimizer_params": {
|
106 |
+
"lr": 2e-05
|
107 |
+
},
|
108 |
+
"scheduler": "WarmupLinear",
|
109 |
+
"steps_per_epoch": null,
|
110 |
+
"warmup_steps": 100,
|
111 |
+
"weight_decay": 0.01
|
112 |
+
}
|
113 |
+
```
|
114 |
+
|
115 |
+
|
116 |
+
## Full Model Architecture
|
117 |
+
```
|
118 |
+
SentenceTransformer(
|
119 |
+
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: MPNetModel
|
120 |
+
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
|
121 |
+
)
|
122 |
+
```
|
123 |
+
|
124 |
+
## Citing & Authors
|
125 |
+
|
126 |
+
<!--- Describe where people can find more information -->
|
config.json
ADDED
@@ -0,0 +1,24 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"_name_or_path": "microsoft/mpnet-base",
|
3 |
+
"architectures": [
|
4 |
+
"MPNetModel"
|
5 |
+
],
|
6 |
+
"attention_probs_dropout_prob": 0.1,
|
7 |
+
"bos_token_id": 0,
|
8 |
+
"eos_token_id": 2,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 768,
|
12 |
+
"initializer_range": 0.02,
|
13 |
+
"intermediate_size": 3072,
|
14 |
+
"layer_norm_eps": 1e-05,
|
15 |
+
"max_position_embeddings": 514,
|
16 |
+
"model_type": "mpnet",
|
17 |
+
"num_attention_heads": 12,
|
18 |
+
"num_hidden_layers": 12,
|
19 |
+
"pad_token_id": 1,
|
20 |
+
"relative_attention_num_buckets": 32,
|
21 |
+
"torch_dtype": "float32",
|
22 |
+
"transformers_version": "4.35.2",
|
23 |
+
"vocab_size": 30527
|
24 |
+
}
|
config_sentence_transformers.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"__version__": {
|
3 |
+
"sentence_transformers": "2.2.2",
|
4 |
+
"transformers": "4.35.2",
|
5 |
+
"pytorch": "2.1.1+cu118"
|
6 |
+
}
|
7 |
+
}
|
eval/similarity_evaluation_results.csv
ADDED
@@ -0,0 +1,46 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
epoch,steps,cosine_pearson,cosine_spearman,euclidean_pearson,euclidean_spearman,manhattan_pearson,manhattan_spearman,dot_pearson,dot_spearman
|
2 |
+
0,2000,0.5112725797696669,0.5079868037535843,0.48407236798428055,0.4888687201381017,0.4835558201917416,0.48885871046717744,0.5145130445495018,0.5110862693283496
|
3 |
+
0,4000,0.6111641357112114,0.6064641321000418,0.5847972484391026,0.5884899560079203,0.5830138084208017,0.5874336656750055,0.6134437127585878,0.6099162221787197
|
4 |
+
0,6000,0.6579722120081701,0.6563783279464613,0.6276196935734649,0.6391973308484455,0.6264648475003323,0.6383539032144969,0.6563447061022429,0.6577615125374133
|
5 |
+
0,8000,0.6978412971860155,0.6946973866807362,0.6657953105465442,0.674151098489152,0.6634713423748253,0.6709172930480223,0.6989396300286976,0.6983819641273818
|
6 |
+
0,10000,0.71664126625434,0.716561126793274,0.6833448007845064,0.705144248042718,0.6807636725890266,0.6997964306348234,0.7133170005452343,0.714783250468296
|
7 |
+
0,12000,0.7447759343821916,0.7437559609542774,0.7057193238976673,0.7218290856032596,0.705497095738836,0.7204414623025513,0.7420934145344666,0.7452860937305911
|
8 |
+
0,14000,0.74915674561808,0.749484574409083,0.708469028768564,0.7295175744167467,0.708862755332043,0.7294867420532651,0.7471797393236392,0.7512694531694221
|
9 |
+
0,16000,0.7707121365174059,0.771329106264553,0.7294723575753942,0.7488274869183182,0.7288860458749342,0.7471076951125543,0.7701621878183128,0.7744463370749671
|
10 |
+
0,-1,0.7673140859747135,0.7667790984570396,0.7263377249282992,0.7456929312078262,0.7253748518648819,0.7434909313501022,0.7655486588557363,0.7686533666157107
|
11 |
+
1,2000,0.7819429938266307,0.78094704409873,0.7416114293224294,0.7595531767831939,0.7421616332537951,0.7592690310441832,0.7808248411660548,0.782697255624755
|
12 |
+
1,4000,0.7882986608457935,0.7868252544880076,0.7467243268522838,0.7645963169916876,0.7466242729955777,0.764125914046692,0.7866295591608243,0.7883454930504136
|
13 |
+
1,6000,0.8002238821079108,0.7982344505789355,0.7634717460588625,0.7829702832791225,0.7624982949030922,0.7807306940679162,0.7980618888927581,0.7984140644466148
|
14 |
+
1,8000,0.8070599201833404,0.8071643755565566,0.7643932383599533,0.7887840170851298,0.7635462817966631,0.7866299670484965,0.8049820389033089,0.807796628000849
|
15 |
+
1,10000,0.8123730342226898,0.81204642594481,0.7724022056326567,0.7968143140466736,0.771142252372132,0.7933508069884749,0.8116375114831246,0.8125024701901095
|
16 |
+
1,12000,0.8171761556472309,0.8191539356024743,0.7721963563242908,0.7988008804957016,0.771338536586359,0.7967986160515254,0.8167556722295175,0.8211600422619921
|
17 |
+
1,14000,0.8217200888451738,0.8229779612489727,0.7737227776076485,0.8001481560538273,0.7733634963086969,0.7984669782070695,0.8199149290350003,0.8243445697613914
|
18 |
+
1,16000,0.8265222758832693,0.8285146250871479,0.7813682048404436,0.8108495176264491,0.78090914547678,0.8091313952280513,0.8251643004911461,0.829006408560849
|
19 |
+
1,-1,0.8262303306713485,0.8282679607936951,0.7790294533575797,0.805921674845624,0.7794735365266164,0.8060717099219609,0.8239888333820669,0.8286823805230532
|
20 |
+
2,2000,0.8286784496459367,0.8310504737694457,0.7819573759907552,0.812093169166234,0.7816215080149245,0.8101384623231614,0.8274909327134539,0.8317846237617296
|
21 |
+
2,4000,0.835616205220834,0.8360943474053417,0.7894847068354555,0.8169915674633668,0.7896787390309352,0.8160661613056649,0.8336247110613197,0.8357800613366163
|
22 |
+
2,6000,0.8398343497156494,0.8399175926927265,0.7950891950547772,0.8212798700074329,0.7954691838435125,0.8205536777256416,0.8375575531504742,0.8396359868478035
|
23 |
+
2,8000,0.8438018637837216,0.8438270986496463,0.795149045551288,0.8203054496547696,0.7958467395245419,0.820601422312527,0.8410070432782604,0.8436613413744672
|
24 |
+
2,10000,0.8468480597007373,0.8476039194230917,0.8001936746003268,0.8268160795644668,0.8004946384225324,0.8264367841395094,0.8443737172866908,0.8477115853297948
|
25 |
+
2,12000,0.8502813607854378,0.8509179756799067,0.80550666725913,0.831197929706751,0.8051254121677959,0.8298668456970126,0.8476986276978584,0.850567279598252
|
26 |
+
2,14000,0.8539073568193281,0.8553464729662648,0.8087714992065229,0.8371078647164694,0.8088870793702209,0.8361343456763813,0.8528162190230996,0.8556900064846321
|
27 |
+
2,16000,0.8544573177596062,0.8580720727688758,0.8037740803189763,0.8373612490337308,0.8039765512515591,0.8362046535013997,0.8535004359962781,0.8582169400534592
|
28 |
+
2,-1,0.8564469620604049,0.8583573088500128,0.8092963756732118,0.8394582862307234,0.8095295258357031,0.8383808437647896,0.8537510545413085,0.8576369640121451
|
29 |
+
3,2000,0.858335492426398,0.8598730511158054,0.8109701818577197,0.8388938096664709,0.8115046245911267,0.8388572580320608,0.857496614165324,0.8603503131250824
|
30 |
+
3,4000,0.8585720362404455,0.8608787200229299,0.8094691553741892,0.8406964015823338,0.8100279877484762,0.8401941017763299,0.8560533825031614,0.8599173187451638
|
31 |
+
3,6000,0.8625065340034024,0.8643877033313818,0.8147635598777612,0.8439370969066217,0.815024717502594,0.8432979741313638,0.8610900947663982,0.8640921848075493
|
32 |
+
3,8000,0.8636424980300432,0.865122090592666,0.8173109962050679,0.845700575720838,0.8182284738793051,0.8459612367985541,0.8613495822247048,0.8641951639601717
|
33 |
+
3,10000,0.8650135805176045,0.8673640283370629,0.816476323494592,0.8466196160872734,0.8169465400233672,0.8462270727279998,0.8634553032248831,0.8670114328838054
|
34 |
+
3,12000,0.8660554514616089,0.8675529876534124,0.820467698385613,0.8491929868057942,0.8206915904174914,0.848456541205093,0.8637250810237758,0.8665606617213012
|
35 |
+
3,14000,0.869624306329827,0.8718296409176024,0.8209503097190902,0.8506831769725806,0.8219506623807201,0.8510081930208988,0.8674419506031797,0.8709930565184041
|
36 |
+
3,16000,0.8704396511213788,0.8735845147650085,0.8213771401822823,0.8531587401746457,0.821977454760251,0.8527795805352297,0.8698618513314226,0.8733708916919545
|
37 |
+
3,-1,0.8700336133151265,0.8726758833679342,0.8204336806274579,0.8520600699024211,0.8215392375394879,0.8524818449200231,0.8686024548768527,0.8720263802394267
|
38 |
+
4,2000,0.87206925200801,0.8739807045524346,0.8261164267655331,0.8544884307137518,0.8271211846423038,0.8547683535320902,0.8705259947562897,0.8734331721011405
|
39 |
+
4,4000,0.8729561300627455,0.8755517209603371,0.8236263307824584,0.8544608040818665,0.8243894920098801,0.8546285432975391,0.8718757908696867,0.8752564718789628
|
40 |
+
4,6000,0.8733050986710387,0.8763342021631335,0.8229918605011646,0.8544769984685667,0.8241693828088917,0.8550171857019131,0.872459812828531,0.8762565503867752
|
41 |
+
4,8000,0.8747918752753512,0.8776918490177567,0.8266396911478837,0.8576142866877868,0.8273975347484257,0.8575592743630189,0.8737816762278383,0.8772673195672135
|
42 |
+
4,10000,0.8756954383154001,0.8785020324188557,0.826701497021203,0.8580297266019579,0.8275847946482253,0.8580523517597078,0.8740733998601564,0.8777094913541181
|
43 |
+
4,12000,0.8766918428682116,0.8791700693786121,0.8287402810383075,0.8591466484870882,0.8295960716735256,0.8592141541249873,0.8749246498622072,0.8780804557469476
|
44 |
+
4,14000,0.877598616280475,0.8800782071546932,0.829796344766758,0.8600729219172725,0.8306199404272812,0.8601873712190522,0.8759751366450554,0.8789839862502832
|
45 |
+
4,16000,0.8776365492614989,0.8802169350974743,0.8292031255117502,0.8601198002110558,0.830114079320727,0.8603507576377725,0.8759709022137859,0.8791947439523577
|
46 |
+
4,-1,0.8776310070951263,0.8802108509348683,0.8291788384293556,0.860103499714235,0.8300926509841282,0.8603385907137475,0.8759596196632796,0.8791872747352656
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:72d7dcc200ac484762759f559d2d6baf469f915f66a051a706478cd8e302eaa4
|
3 |
+
size 437967672
|
modules.json
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
[
|
2 |
+
{
|
3 |
+
"idx": 0,
|
4 |
+
"name": "0",
|
5 |
+
"path": "",
|
6 |
+
"type": "sentence_transformers.models.Transformer"
|
7 |
+
},
|
8 |
+
{
|
9 |
+
"idx": 1,
|
10 |
+
"name": "1",
|
11 |
+
"path": "1_Pooling",
|
12 |
+
"type": "sentence_transformers.models.Pooling"
|
13 |
+
}
|
14 |
+
]
|
sentence_bert_config.json
ADDED
@@ -0,0 +1,4 @@
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"max_seq_length": 256,
|
3 |
+
"do_lower_case": false
|
4 |
+
}
|
special_tokens_map.json
ADDED
@@ -0,0 +1,51 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": {
|
3 |
+
"content": "<s>",
|
4 |
+
"lstrip": false,
|
5 |
+
"normalized": false,
|
6 |
+
"rstrip": false,
|
7 |
+
"single_word": false
|
8 |
+
},
|
9 |
+
"cls_token": {
|
10 |
+
"content": "<s>",
|
11 |
+
"lstrip": false,
|
12 |
+
"normalized": true,
|
13 |
+
"rstrip": false,
|
14 |
+
"single_word": false
|
15 |
+
},
|
16 |
+
"eos_token": {
|
17 |
+
"content": "</s>",
|
18 |
+
"lstrip": false,
|
19 |
+
"normalized": false,
|
20 |
+
"rstrip": false,
|
21 |
+
"single_word": false
|
22 |
+
},
|
23 |
+
"mask_token": {
|
24 |
+
"content": "<mask>",
|
25 |
+
"lstrip": true,
|
26 |
+
"normalized": false,
|
27 |
+
"rstrip": false,
|
28 |
+
"single_word": false
|
29 |
+
},
|
30 |
+
"pad_token": {
|
31 |
+
"content": "<pad>",
|
32 |
+
"lstrip": false,
|
33 |
+
"normalized": false,
|
34 |
+
"rstrip": false,
|
35 |
+
"single_word": false
|
36 |
+
},
|
37 |
+
"sep_token": {
|
38 |
+
"content": "</s>",
|
39 |
+
"lstrip": false,
|
40 |
+
"normalized": true,
|
41 |
+
"rstrip": false,
|
42 |
+
"single_word": false
|
43 |
+
},
|
44 |
+
"unk_token": {
|
45 |
+
"content": "[UNK]",
|
46 |
+
"lstrip": false,
|
47 |
+
"normalized": false,
|
48 |
+
"rstrip": false,
|
49 |
+
"single_word": false
|
50 |
+
}
|
51 |
+
}
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,65 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "<s>",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"1": {
|
12 |
+
"content": "<pad>",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"2": {
|
20 |
+
"content": "</s>",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"3": {
|
28 |
+
"content": "<unk>",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": true,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"104": {
|
36 |
+
"content": "[UNK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
},
|
43 |
+
"30526": {
|
44 |
+
"content": "<mask>",
|
45 |
+
"lstrip": true,
|
46 |
+
"normalized": false,
|
47 |
+
"rstrip": false,
|
48 |
+
"single_word": false,
|
49 |
+
"special": true
|
50 |
+
}
|
51 |
+
},
|
52 |
+
"bos_token": "<s>",
|
53 |
+
"clean_up_tokenization_spaces": true,
|
54 |
+
"cls_token": "<s>",
|
55 |
+
"do_lower_case": true,
|
56 |
+
"eos_token": "</s>",
|
57 |
+
"mask_token": "<mask>",
|
58 |
+
"model_max_length": 512,
|
59 |
+
"pad_token": "<pad>",
|
60 |
+
"sep_token": "</s>",
|
61 |
+
"strip_accents": null,
|
62 |
+
"tokenize_chinese_chars": true,
|
63 |
+
"tokenizer_class": "MPNetTokenizer",
|
64 |
+
"unk_token": "[UNK]"
|
65 |
+
}
|
vocab.txt
ADDED
The diff for this file is too large to render.
See raw diff
|
|