MaziyarPanahi commited on
Commit
d048128
·
verified ·
1 Parent(s): 55fd077

feat: Upload fine-tuned medical NER model OpenMed-NER-GenomicDetect-EuroMed-212M

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ openmed_vs_sota_grouped_bars.png filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ widget:
3
+ - text: "The BRCA2 gene is associated with hereditary breast cancer."
4
+ - text: "Mutations in the CFTR gene cause cystic fibrosis."
5
+ - text: "The APOE gene variant affects Alzheimer's disease risk."
6
+ - text: "The HTT gene provides instructions for making a protein called huntingtin."
7
+ - text: "Sickle cell disease is caused by a mutation in the HBB gene."
8
+ tags:
9
+ - token-classification
10
+ - named-entity-recognition
11
+ - biomedical-nlp
12
+ - transformers
13
+ - gene-recognition
14
+ - genetics
15
+ - genomics
16
+ - molecular-biology
17
+ - cell-line-name
18
+ language:
19
+ - en
20
+ license: apache-2.0
21
+ ---
22
+
23
+ # 🧬 [OpenMed-NER-GenomicDetect-EuroMed-212M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-EuroMed-212M)
24
+
25
+ **Specialized model for Gene Entity Recognition - Gene-related entities**
26
+
27
+ [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
28
+ [![Python](https://img.shields.io/badge/Python-3.8%2B-blue)]()
29
+ [![Transformers](https://img.shields.io/badge/🤗-Transformers-yellow)]()
30
+ [![OpenMed](https://img.shields.io/badge/🏥-OpenMed-green)](https://huggingface.co/OpenMed)
31
+
32
+ ## 📋 Model Overview
33
+
34
+ This model is a **state-of-the-art** fine-tuned transformer engineered to deliver **enterprise-grade accuracy** for gene entity recognition - gene-related entities. This specialized model excels at identifying and extracting biomedical entities from clinical texts, research papers, and healthcare documents, enabling applications such as **drug interaction detection**, **medication extraction from patient records**, **adverse event monitoring**, **literature mining for drug discovery**, and **biomedical knowledge graph construction** with **production-ready reliability** for clinical and research applications.
35
+
36
+ ### 🎯 Key Features
37
+ - **High Precision**: Optimized for biomedical entity recognition
38
+ - **Domain-Specific**: Trained on curated GELLUS dataset
39
+ - **Production-Ready**: Validated on clinical benchmarks
40
+ - **Easy Integration**: Compatible with Hugging Face Transformers ecosystem
41
+
42
+ ### 🏷️ Supported Entity Types
43
+
44
+ This model can identify and classify the following biomedical entities:
45
+
46
+ - `B-Cell-line-name`
47
+ - `I-Cell-line-name`
48
+
49
+ ## 📊 Dataset
50
+
51
+ Gellus corpus targets gene recognition and genetics entities for genomics and molecular biology applications.
52
+
53
+ The Gellus corpus is a biomedical NER dataset specifically designed for gene recognition and genetics entity extraction in molecular biology literature. This corpus contains comprehensive annotations for gene names, genetic variants, and genomics-related entities that are essential for genetic research and genomics applications. The dataset supports the development of automated systems for gene mention identification, genetic association studies, and genomics text mining. It is particularly valuable for identifying genes involved in hereditary diseases, genetic disorders, and molecular genetics research. The corpus serves as a benchmark for evaluating NER models used in genetics research, personalized medicine, and genomics informatics, contributing to advances in precision medicine and genetic counseling applications.
54
+
55
+
56
+ ## 📊 Performance Metrics
57
+
58
+ ### Current Model Performance
59
+ - **F1 Score**: `0.99`
60
+ - **Precision**: `1.00`
61
+ - **Recall**: `0.98`
62
+ - **Accuracy**: `1.00`
63
+
64
+ ### 🏆 Comparative Performance on GELLUS Dataset
65
+
66
+ | Rank | Model | F1 Score | Precision | Recall | Accuracy |
67
+ |------|-------|----------|-----------|--------|-----------|
68
+ | 🥇 1 | [OpenMed-NER-GenomicDetect-SnowMed-568M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-SnowMed-568M) | **0.9976** | 0.9977 | 0.9975 | 0.9989 |
69
+ | 🥈 2 | [OpenMed-NER-GenomicDetect-SuperMedical-355M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-SuperMedical-355M) | **0.9970** | 0.9960 | 0.9981 | 0.9986 |
70
+ | 🥉 3 | [OpenMed-NER-GenomicDetect-BigMed-560M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-BigMed-560M) | **0.9968** | 0.9967 | 0.9969 | 0.9986 |
71
+ | 4 | [OpenMed-NER-GenomicDetect-MultiMed-568M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-MultiMed-568M) | **0.9967** | 0.9974 | 0.9960 | 0.9985 |
72
+ | 5 | [OpenMed-NER-GenomicDetect-PubMed-109M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-PubMed-109M) | **0.9964** | 0.9957 | 0.9970 | 0.9992 |
73
+ | 6 | [OpenMed-NER-GenomicDetect-PubMed-335M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-PubMed-335M) | **0.9963** | 0.9961 | 0.9965 | 0.9991 |
74
+ | 7 | [OpenMed-NER-GenomicDetect-PubMed-109M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-PubMed-109M) | **0.9951** | 0.9948 | 0.9953 | 0.9991 |
75
+ | 8 | [OpenMed-NER-GenomicDetect-BioMed-109M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-BioMed-109M) | **0.9941** | 0.9934 | 0.9949 | 0.9988 |
76
+ | 9 | [OpenMed-NER-GenomicDetect-TinyMed-82M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-TinyMed-82M) | **0.9940** | 0.9997 | 0.9884 | 0.9961 |
77
+ | 10 | [OpenMed-NER-GenomicDetect-SuperMedical-125M](https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-SuperMedical-125M) | **0.9934** | 0.9999 | 0.9870 | 0.9958 |
78
+
79
+
80
+ *Rankings based on F1-score performance across all models trained on this dataset.*
81
+
82
+ ![OpenMed (open-source) vs. latest closed-source SOTA](https://huggingface.co/spaces/OpenMed/README/resolve/main/openmed_vs_sota_performance.png)
83
+
84
+ *Figure: OpenMed (Open-Source) vs. Latest SOTA (Closed-Source) performance comparison across biomedical NER datasets.*
85
+
86
+ ## 🚀 Quick Start
87
+
88
+ ### Installation
89
+
90
+ ```bash
91
+ pip install transformers torch
92
+ ```
93
+
94
+ ### Usage
95
+
96
+ ```python
97
+ from transformers import pipeline
98
+
99
+ # Load the model and tokenizer
100
+ # Model: https://huggingface.co/OpenMed/OpenMed-NER-GenomicDetect-EuroMed-212M
101
+ model_name = "OpenMed/OpenMed-NER-GenomicDetect-EuroMed-212M"
102
+
103
+ # Create a pipeline
104
+ medical_ner_pipeline = pipeline(
105
+ model=model_name,
106
+ aggregation_strategy="simple"
107
+ )
108
+
109
+ # Example usage
110
+ text = "The BRCA2 gene is associated with hereditary breast cancer."
111
+ entities = medical_ner_pipeline(text)
112
+
113
+ print(entities)
114
+
115
+ token = entities[0]
116
+ print(text[token["start"] : token["end"]])
117
+ ```
118
+
119
+ NOTE: The `aggregation_strategy` parameter defines how token predictions are grouped into entities. For a detailed explanation, please refer to the [Hugging Face documentation](https://huggingface.co/docs/transformers/en/main_classes/pipelines#transformers.TokenClassificationPipeline.aggregation_strategy).
120
+
121
+ Here is a summary of the available strategies:
122
+ - **`none`**: Returns raw token predictions without any aggregation.
123
+ - **`simple`**: Groups adjacent tokens with the same entity type (e.g., `B-LOC` followed by `I-LOC`).
124
+ - **`first`**: For word-based models, if tokens within a word have different entity tags, the tag of the first token is assigned to the entire word.
125
+ - **`average`**: For word-based models, this strategy averages the scores of tokens within a word and applies the label with the highest resulting score.
126
+ - **`max`**: For word-based models, the entity label from the token with the highest score within a word is assigned to the entire word.
127
+
128
+ ### Batch Processing
129
+
130
+ For efficient processing of large datasets, use proper batching with the `batch_size` parameter:
131
+
132
+ ```python
133
+ texts = [
134
+ "The BRCA2 gene is associated with hereditary breast cancer.",
135
+ "Mutations in the CFTR gene cause cystic fibrosis.",
136
+ "The APOE gene variant affects Alzheimer's disease risk.",
137
+ "The HTT gene provides instructions for making a protein called huntingtin.",
138
+ "Sickle cell disease is caused by a mutation in the HBB gene.",
139
+ ]
140
+
141
+ # Efficient batch processing with optimized batch size
142
+ # Adjust batch_size based on your GPU memory (typically 8, 16, 32, or 64)
143
+ results = medical_ner_pipeline(texts, batch_size=8)
144
+
145
+ for i, entities in enumerate(results):
146
+ print(f"Text {i+1} entities:")
147
+ for entity in entities:
148
+ print(f" - {entity['word']} ({entity['entity_group']}): {entity['score']:.4f}")
149
+ ```
150
+
151
+ ### Large Dataset Processing
152
+
153
+ For processing large datasets efficiently:
154
+
155
+ ```python
156
+ from transformers.pipelines.pt_utils import KeyDataset
157
+ from datasets import Dataset
158
+ import pandas as pd
159
+
160
+ # Load your data
161
+ # Load a medical dataset from Hugging Face
162
+ from datasets import load_dataset
163
+
164
+ # Load a public medical dataset (using a subset for testing)
165
+ medical_dataset = load_dataset("BI55/MedText", split="train[:100]") # Load first 100 examples
166
+ data = pd.DataFrame({"text": medical_dataset["Completion"]})
167
+ dataset = Dataset.from_pandas(data)
168
+
169
+ # Process with optimal batching for your hardware
170
+ batch_size = 16 # Tune this based on your GPU memory
171
+ results = []
172
+
173
+ for out in medical_ner_pipeline(KeyDataset(dataset, "text"), batch_size=batch_size):
174
+ results.extend(out)
175
+
176
+ print(f"Processed {len(results)} texts with batching")
177
+
178
+ ```
179
+
180
+ ### Performance Optimization
181
+
182
+ **Batch Size Guidelines:**
183
+ - **CPU**: Start with batch_size=1-4
184
+ - **Single GPU**: Try batch_size=8-32 depending on GPU memory
185
+ - **High-end GPU**: Can handle batch_size=64 or higher
186
+ - **Monitor GPU utilization** to find the optimal batch size for your hardware
187
+
188
+ **Memory Considerations:**
189
+ ```python
190
+ # For limited GPU memory, use smaller batches
191
+ medical_ner_pipeline = pipeline(
192
+ model=model_name,
193
+ aggregation_strategy="simple",
194
+ device=0 # Specify GPU device
195
+ )
196
+
197
+ # Process with memory-efficient batching
198
+ for batch_start in range(0, len(texts), batch_size):
199
+ batch = texts[batch_start:batch_start + batch_size]
200
+ batch_results = medical_ner_pipeline(batch, batch_size=len(batch))
201
+ results.extend(batch_results)
202
+ ```
203
+
204
+ ## 📚 Dataset Information
205
+
206
+ - **Dataset**: GELLUS
207
+ - **Description**: Gene Entity Recognition - Gene-related entities
208
+
209
+ ### Training Details
210
+ - **Base Model**: EuroBERT-210m
211
+ - **Training Framework**: Hugging Face Transformers
212
+ - **Optimization**: AdamW optimizer with learning rate scheduling
213
+ - **Validation**: Cross-validation on held-out test set
214
+
215
+ ## 🔬 Model Architecture
216
+
217
+ - **Base Architecture**: EuroBERT-210m
218
+ - **Task**: Token Classification (Named Entity Recognition)
219
+ - **Labels**: Dataset-specific entity types
220
+ - **Input**: Tokenized biomedical text
221
+ - **Output**: BIO-tagged entity predictions
222
+
223
+ ## 💡 Use Cases
224
+
225
+ This model is particularly useful for:
226
+ - **Clinical Text Mining**: Extracting entities from medical records
227
+ - **Biomedical Research**: Processing scientific literature
228
+ - **Drug Discovery**: Identifying chemical compounds and drugs
229
+ - **Healthcare Analytics**: Analyzing patient data and outcomes
230
+ - **Academic Research**: Supporting biomedical NLP research
231
+
232
+ ## 📜 License
233
+
234
+ Licensed under the Apache License 2.0. See [LICENSE](https://www.apache.org/licenses/LICENSE-2.0) for details.
235
+
236
+ ## 🤝 Contributing
237
+
238
+ We welcome contributions of all kinds! Whether you have ideas, feature requests, or want to join our mission to advance open-source Healthcare AI, we'd love to hear from you.
239
+
240
+ Follow [OpenMed Org](https://huggingface.co/OpenMed) on Hugging Face 🤗 and click "Watch" to stay updated on our latest releases and developments.
241
+
242
+
config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "EuroBertForTokenClassification"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "auto_map": {
8
+ "AutoConfig": "configuration_eurobert.EuroBertConfig",
9
+ "AutoModel": "modeling_eurobert.EuroBertModel",
10
+ "AutoModelForMaskedLM": "modeling_eurobert.EuroBertForMaskedLM",
11
+ "AutoModelForPreTraining": "modeling_eurobert.EuroBertPreTrainedModel",
12
+ "AutoModelForSequenceClassification": "modeling_eurobert.EuroBertForSequenceClassification",
13
+ "AutoModelForTokenClassification": "modeling_eurobert.EuroBertForTokenClassification"
14
+ },
15
+ "bos_token": "<|begin_of_text|>",
16
+ "bos_token_id": 128000,
17
+ "clf_pooling": "late",
18
+ "eos_token": "<|end_of_text|>",
19
+ "eos_token_id": 128001,
20
+ "head_dim": 64,
21
+ "hidden_act": "silu",
22
+ "hidden_dropout": 0.0,
23
+ "hidden_size": 768,
24
+ "id2label": {
25
+ "0": "B-Cell-line-name",
26
+ "1": "I-Cell-line-name",
27
+ "2": "O"
28
+ },
29
+ "initializer_range": 0.02,
30
+ "intermediate_size": 3072,
31
+ "label2id": {
32
+ "B-Cell-line-name": 0,
33
+ "I-Cell-line-name": 1,
34
+ "O": 2
35
+ },
36
+ "mask_token": "<|mask|>",
37
+ "mask_token_id": 128002,
38
+ "max_position_embeddings": 8192,
39
+ "mlp_bias": false,
40
+ "model_type": "eurobert",
41
+ "num_attention_heads": 12,
42
+ "num_hidden_layers": 12,
43
+ "num_key_value_heads": 12,
44
+ "pad_token": "<|end_of_text|>",
45
+ "pad_token_id": 128001,
46
+ "pretraining_tp": 1,
47
+ "rms_norm_eps": 1e-05,
48
+ "rope_scaling": null,
49
+ "rope_theta": 250000,
50
+ "tie_word_embeddings": false,
51
+ "torch_dtype": "bfloat16",
52
+ "transformers_version": "4.53.2",
53
+ "use_cache": false,
54
+ "vocab_size": 128256
55
+ }
configuration_eurobert.py ADDED
@@ -0,0 +1,216 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/eurobert/modular_eurobert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_eurobert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 Nicolas Boizard, Duarte M. Alves, Hippolyte Gisserot-Boukhlef and the EuroBert team. All rights reserved.
9
+ #
10
+ #
11
+ # Licensed under the Apache License, Version 2.0 (the "License");
12
+ # you may not use this file except in compliance with the License.
13
+ # You may obtain a copy of the License at
14
+ #
15
+ # http://www.apache.org/licenses/LICENSE-2.0
16
+ #
17
+ # Unless required by applicable law or agreed to in writing, software
18
+ # distributed under the License is distributed on an "AS IS" BASIS,
19
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
20
+ # See the License for the specific language governing permissions and
21
+ # limitations under the License.
22
+
23
+ from transformers.utils import logging
24
+ from transformers.models.llama import LlamaConfig
25
+
26
+
27
+ logger = logging.get_logger(__name__)
28
+
29
+
30
+ class EuroBertConfig(LlamaConfig):
31
+ r"""
32
+ This is the configuration class to store the configuration of a [`EuroBertModel`]. It is used to instantiate an EuroBert
33
+ model according to the specified arguments, defining the model architecture. Instantiating a configuration with the
34
+ defaults will yield a similar configuration to that of the EuroBERT-210m.
35
+
36
+ Configuration objects inherit from [`PretrainedConfig`] and can be used to control the model outputs. Read the
37
+ documentation from [`PretrainedConfig`] for more information.
38
+
39
+
40
+ Args:
41
+ vocab_size (`int`, *optional*, defaults to 128256):
42
+ Vocabulary size of the EuroBert model. Defines the number of different tokens that can be represented by the
43
+ `inputs_ids` passed when calling [`EuroBertModel`]
44
+ hidden_size (`int`, *optional*, defaults to 768):
45
+ Dimensionality of the encoder layers and the pooler layer.
46
+ intermediate_size (`int`, *optional*, defaults to 3072):
47
+ Dimensionality of the "intermediate" (often named feed-forward) layer in the Transformer encoder.
48
+ num_hidden_layers (`int`, *optional*, defaults to 12):
49
+ Number of hidden layers in the Transformer encoder.
50
+ num_attention_heads (`int`, *optional*, defaults to 12):
51
+ Number of attention heads for each attention layer in the Transformer encoder.
52
+ num_key_value_heads (`int`, *optional*):
53
+ This is the number of key_value heads that should be used to implement Grouped Query Attention. If
54
+ `num_key_value_heads=num_attention_heads`, the model will use Multi Head Attention (MHA), if
55
+ `num_key_value_heads=1` the model will use Multi Query Attention (MQA) otherwise GQA is used. When
56
+ converting a multi-head checkpoint to a GQA checkpoint, each group key and value head should be constructed
57
+ by meanpooling all the original heads within that group. For more details checkout [this
58
+ paper](https://arxiv.org/pdf/2305.13245.pdf). If it is not specified, will default to
59
+ `num_attention_heads`.
60
+ hidden_act (`str` or `function`, *optional*, defaults to `"silu"`):
61
+ The non-linear activation function (function or string) in the encoder and pooler.
62
+ max_position_embeddings (`int`, *optional*, defaults to 8192):
63
+ The maximum sequence length that this model might ever be used with. EuroBert supports up to 8192 tokens,
64
+ EuroBert-pretrained up to 2048.
65
+ initializer_range (`float`, *optional*, defaults to 0.02):
66
+ The standard deviation of the truncated_normal_initializer for initializing all weight matrices.
67
+ rms_norm_eps (`float`, *optional*, defaults to 1e-05):
68
+ The epsilon used by the rms normalization layers.
69
+ bos_token_id (`int`, *optional*, defaults to 128000):
70
+ Beginning of stream token id.
71
+ eos_token_id (`int`, *optional*, defaults to 128001):
72
+ End of stream token id.
73
+ pad_token_id (`int`, *optional*, defaults to 128001):
74
+ Padding token id.
75
+ mask_token_id (`int`, *optional*, defaults to 128002):
76
+ Mask token id.
77
+ pretraining_tp (`int`, *optional*, defaults to 1):
78
+ Experimental feature. Tensor parallelism rank used during pretraining. Please refer to [this
79
+ document](https://huggingface.co/docs/transformers/main/perf_train_gpu_many#tensor-parallelism) to
80
+ understand more about it. This value is necessary to ensure exact reproducibility of the pretraining
81
+ results. Please refer to [this issue](https://github.com/pytorch/pytorch/issues/76232).
82
+ tie_word_embeddings (`bool`, *optional*, defaults to `False`):
83
+ Whether to tie weight embeddings
84
+ rope_theta (`float`, *optional*, defaults to 250000.0):
85
+ The base period of the RoPE embeddings. EuroBert used base period of 250000.0,
86
+ EuroBert-pretrained 10000.0.
87
+ rope_scaling (`Dict`, *optional*):
88
+ Dictionary containing the scaling configuration for the RoPE embeddings. NOTE: if you apply new rope type
89
+ and you expect the model to work on longer `max_position_embeddings`, we recommend you to update this value
90
+ accordingly.
91
+ Expected contents:
92
+ `rope_type` (`str`):
93
+ The sub-variant of RoPE to use. Can be one of ['default', 'linear', 'dynamic', 'yarn', 'longrope',
94
+ 'eurobert3'], with 'default' being the original RoPE implementation.
95
+ `factor` (`float`, *optional*):
96
+ Used with all rope types except 'default'. The scaling factor to apply to the RoPE embeddings. In
97
+ most scaling types, a `factor` of x will enable the model to handle sequences of length x *
98
+ original maximum pre-trained length.
99
+ `original_max_position_embeddings` (`int`, *optional*):
100
+ Used with 'dynamic', 'longrope' and 'eurobert3'. The original max position embeddings used during
101
+ pretraining.
102
+ `attention_factor` (`float`, *optional*):
103
+ Used with 'yarn' and 'longrope'. The scaling factor to be applied on the attention
104
+ computation. If unspecified, it defaults to value recommended by the implementation, using the
105
+ `factor` field to infer the suggested value.
106
+ `beta_fast` (`float`, *optional*):
107
+ Only used with 'yarn'. Parameter to set the boundary for extrapolation (only) in the linear
108
+ ramp function. If unspecified, it defaults to 32.
109
+ `beta_slow` (`float`, *optional*):
110
+ Only used with 'yarn'. Parameter to set the boundary for interpolation (only) in the linear
111
+ ramp function. If unspecified, it defaults to 1.
112
+ `short_factor` (`List[float]`, *optional*):
113
+ Only used with 'longrope'. The scaling factor to be applied to short contexts (<
114
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
115
+ size divided by the number of attention heads divided by 2
116
+ `long_factor` (`List[float]`, *optional*):
117
+ Only used with 'longrope'. The scaling factor to be applied to long contexts (<
118
+ `original_max_position_embeddings`). Must be a list of numbers with the same length as the hidden
119
+ size divided by the number of attention heads divided by 2
120
+ `low_freq_factor` (`float`, *optional*):
121
+ Only used with 'eurobert3'. Scaling factor applied to low frequency components of the RoPE
122
+ `high_freq_factor` (`float`, *optional*):
123
+ Only used with 'eurobert3'. Scaling factor applied to high frequency components of the RoPE
124
+ attention_bias (`bool`, *optional*, defaults to `False`):
125
+ Whether to use a bias in the query, key, value and output projection layers during self-attention.
126
+ attention_dropout (`float`, *optional*, defaults to 0.0):
127
+ The dropout ratio for the attention probabilities.
128
+ mlp_bias (`bool`, *optional*, defaults to `False`):
129
+ Whether to use a bias in up_proj, down_proj and gate_proj layers in the MLP layers.
130
+ head_dim (`int`, *optional*):
131
+ The attention head dimension. If None, it will default to hidden_size // num_attention_heads
132
+ classifier_pooling (`str`, *optional*, defaults to `"late"`):
133
+ The pooling strategy to use for the classifier. Can be one of ['bos', 'mean', 'late'].
134
+
135
+ ```python
136
+ >>> from transformers import EuroBertModel, EuroBertConfig
137
+
138
+ >>> # Initializing a EuroBert eurobert-base style configuration
139
+ >>> configuration = EuroBertConfig()
140
+
141
+ >>> # Initializing a model from the eurobert-base style configuration
142
+ >>> model = EuroBertModel(configuration)
143
+
144
+ >>> # Accessing the model configuration
145
+ >>> configuration = model.config
146
+ ```"""
147
+
148
+ model_type = "eurobert"
149
+
150
+ def __init__(
151
+ self,
152
+ vocab_size=128256,
153
+ hidden_size=768,
154
+ intermediate_size=3072,
155
+ num_hidden_layers=12,
156
+ num_attention_heads=12,
157
+ num_key_value_heads=None,
158
+ hidden_act="silu",
159
+ max_position_embeddings=8192,
160
+ initializer_range=0.02,
161
+ rms_norm_eps=1e-05,
162
+ bos_token_id=128000,
163
+ eos_token_id=128001,
164
+ pad_token_id=128001,
165
+ mask_token_id=128002,
166
+ pretraining_tp=1,
167
+ tie_word_embeddings=False,
168
+ rope_theta=250000.0,
169
+ rope_scaling=None,
170
+ attention_bias=False,
171
+ attention_dropout=0.0,
172
+ mlp_bias=False,
173
+ head_dim=None,
174
+ classifier_pooling="late",
175
+ **kwargs,
176
+ ):
177
+ # use_cache is specific to decoder models and should be set to False for encoder models
178
+ use_cache = kwargs.pop("use_cache", None)
179
+ if use_cache:
180
+ logger.warning_once(
181
+ "The `use_cache` argument to EuroBertConfig is set to `False`, as caching is never used for encoder models."
182
+ )
183
+
184
+ if num_key_value_heads is None:
185
+ num_key_value_heads = num_attention_heads
186
+
187
+ super().__init__(
188
+ vocab_size=vocab_size,
189
+ hidden_size=hidden_size,
190
+ intermediate_size=intermediate_size,
191
+ num_hidden_layers=num_hidden_layers,
192
+ num_attention_heads=num_attention_heads,
193
+ num_key_value_heads=num_key_value_heads,
194
+ hidden_act=hidden_act,
195
+ max_position_embeddings=max_position_embeddings,
196
+ initializer_range=initializer_range,
197
+ rms_norm_eps=rms_norm_eps,
198
+ use_cache=False,
199
+ bos_token_id=bos_token_id,
200
+ eos_token_id=eos_token_id,
201
+ pad_token_id=pad_token_id,
202
+ pretraining_tp=pretraining_tp,
203
+ tie_word_embeddings=tie_word_embeddings,
204
+ rope_theta=rope_theta,
205
+ rope_scaling=rope_scaling,
206
+ attention_bias=attention_bias,
207
+ attention_dropout=attention_dropout,
208
+ mlp_bias=mlp_bias,
209
+ head_dim=head_dim,
210
+ **kwargs,
211
+ )
212
+ self.mask_token_id = mask_token_id
213
+ self.clf_pooling = classifier_pooling
214
+
215
+
216
+ __all__ = ["EuroBertConfig"]
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52ed52b8609f64c25c497bed60920c9328ff7599c90db1023ab28095b515142a
3
+ size 423549134
modeling_eurobert.py ADDED
@@ -0,0 +1,960 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
2
+ # This file was automatically generated from src/transformers/models/eurobert/modular_eurobert.py.
3
+ # Do NOT edit this file manually as any edits will be overwritten by the generation of
4
+ # the file from the modular. If any change should be done, please apply the change to the
5
+ # modular_eurobert.py file directly. One of our CI enforces this.
6
+ # 🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨🚨
7
+ # coding=utf-8
8
+ # Copyright 2025 Nicolas Boizard, Duarte M. Alves, Hippolyte Gisserot-Boukhlef and the EuroBert team. All rights reserved.
9
+ #
10
+ #
11
+ # Licensed under the Apache License, Version 2.0 (the "License");
12
+ # you may not use this file except in compliance with the License.
13
+ # You may obtain a copy of the License at
14
+ #
15
+ # http://www.apache.org/licenses/LICENSE-2.0
16
+ #
17
+ # Unless required by applicable law or agreed to in writing, software
18
+ # distributed under the License is distributed on an "AS IS" BASIS,
19
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
20
+ # See the License for the specific language governing permissions and
21
+ # limitations under the License.
22
+
23
+ from typing import Callable, Optional, Tuple, Union
24
+
25
+ import torch
26
+ from torch import nn
27
+ from torch.nn import BCEWithLogitsLoss, CrossEntropyLoss, MSELoss
28
+
29
+ from transformers.activations import ACT2FN
30
+ from transformers.cache_utils import Cache, StaticCache
31
+ from transformers.modeling_attn_mask_utils import AttentionMaskConverter
32
+ from transformers.modeling_flash_attention_utils import FlashAttentionKwargs
33
+ from transformers.modeling_outputs import BaseModelOutput, BaseModelOutputWithPast, MaskedLMOutput, SequenceClassifierOutput, TokenClassifierOutput
34
+ from transformers.modeling_rope_utils import ROPE_INIT_FUNCTIONS
35
+ from transformers.modeling_utils import ALL_ATTENTION_FUNCTIONS, PreTrainedModel
36
+ from transformers.processing_utils import Unpack
37
+ from transformers.utils import add_code_sample_docstrings, add_start_docstrings, add_start_docstrings_to_model_forward, logging
38
+ from .configuration_eurobert import EuroBertConfig
39
+
40
+
41
+ logger = logging.get_logger(__name__)
42
+
43
+ _CHECKPOINT_FOR_DOC = "EuroBERT/EuroBERT-210m"
44
+ _CONFIG_FOR_DOC = "EuroBertConfig"
45
+
46
+
47
+ class EuroBertRMSNorm(nn.Module):
48
+ def __init__(self, hidden_size, eps=1e-5):
49
+ """
50
+ EuroBertRMSNorm is equivalent to T5LayerNorm
51
+ """
52
+ super().__init__()
53
+ self.weight = nn.Parameter(torch.ones(hidden_size))
54
+ self.variance_epsilon = eps
55
+
56
+ def forward(self, hidden_states):
57
+ input_dtype = hidden_states.dtype
58
+ hidden_states = hidden_states.to(torch.float32)
59
+ variance = hidden_states.pow(2).mean(-1, keepdim=True)
60
+ hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)
61
+ return self.weight * hidden_states.to(input_dtype)
62
+
63
+ def extra_repr(self):
64
+ return f"{tuple(self.weight.shape)}, eps={self.variance_epsilon}"
65
+
66
+
67
+ def rotate_half(x):
68
+ """Rotates half the hidden dims of the input."""
69
+ x1 = x[..., : x.shape[-1] // 2]
70
+ x2 = x[..., x.shape[-1] // 2 :]
71
+ return torch.cat((-x2, x1), dim=-1)
72
+
73
+
74
+ def apply_rotary_pos_emb(q, k, cos, sin, position_ids=None, unsqueeze_dim=1):
75
+ """Applies Rotary Position Embedding to the query and key tensors.
76
+
77
+ Args:
78
+ q (`torch.Tensor`): The query tensor.
79
+ k (`torch.Tensor`): The key tensor.
80
+ cos (`torch.Tensor`): The cosine part of the rotary embedding.
81
+ sin (`torch.Tensor`): The sine part of the rotary embedding.
82
+ position_ids (`torch.Tensor`, *optional*):
83
+ Deprecated and unused.
84
+ unsqueeze_dim (`int`, *optional*, defaults to 1):
85
+ The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
86
+ sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
87
+ that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
88
+ k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
89
+ cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
90
+ the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
91
+ Returns:
92
+ `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
93
+ """
94
+ cos = cos.unsqueeze(unsqueeze_dim)
95
+ sin = sin.unsqueeze(unsqueeze_dim)
96
+ q_embed = (q * cos) + (rotate_half(q) * sin)
97
+ k_embed = (k * cos) + (rotate_half(k) * sin)
98
+ return q_embed, k_embed
99
+
100
+
101
+ def repeat_kv(hidden_states: torch.Tensor, n_rep: int) -> torch.Tensor:
102
+ """
103
+ This is the equivalent of torch.repeat_interleave(x, dim=1, repeats=n_rep). The hidden states go from (batch,
104
+ num_key_value_heads, seqlen, head_dim) to (batch, num_attention_heads, seqlen, head_dim)
105
+ """
106
+ batch, num_key_value_heads, slen, head_dim = hidden_states.shape
107
+ if n_rep == 1:
108
+ return hidden_states
109
+ hidden_states = hidden_states[:, :, None, :, :].expand(batch, num_key_value_heads, n_rep, slen, head_dim)
110
+ return hidden_states.reshape(batch, num_key_value_heads * n_rep, slen, head_dim)
111
+
112
+
113
+ def eager_attention_forward(
114
+ module: nn.Module,
115
+ query: torch.Tensor,
116
+ key: torch.Tensor,
117
+ value: torch.Tensor,
118
+ attention_mask: Optional[torch.Tensor],
119
+ scaling: float,
120
+ dropout: float = 0.0,
121
+ **kwargs,
122
+ ):
123
+ key_states = repeat_kv(key, module.num_key_value_groups)
124
+ value_states = repeat_kv(value, module.num_key_value_groups)
125
+
126
+ attn_weights = torch.matmul(query, key_states.transpose(2, 3)) * scaling
127
+ if attention_mask is not None:
128
+ causal_mask = attention_mask[:, :, :, : key_states.shape[-2]]
129
+ attn_weights = attn_weights + causal_mask
130
+
131
+ attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query.dtype)
132
+ attn_weights = nn.functional.dropout(attn_weights, p=dropout, training=module.training)
133
+ attn_output = torch.matmul(attn_weights, value_states)
134
+ attn_output = attn_output.transpose(1, 2).contiguous()
135
+
136
+ return attn_output, attn_weights
137
+
138
+
139
+ class EuroBertAttention(nn.Module):
140
+ """Multi-headed attention from 'Attention Is All You Need' paper"""
141
+
142
+ def __init__(self, config: EuroBertConfig, layer_idx: int):
143
+ super().__init__()
144
+ self.config = config
145
+ self.layer_idx = layer_idx
146
+ self.head_dim = getattr(config, "head_dim", config.hidden_size // config.num_attention_heads)
147
+ self.num_key_value_groups = config.num_attention_heads // config.num_key_value_heads
148
+ self.scaling = self.head_dim**-0.5
149
+ self.attention_dropout = config.attention_dropout
150
+ self.is_causal = False
151
+
152
+ self.q_proj = nn.Linear(
153
+ config.hidden_size, config.num_attention_heads * self.head_dim, bias=config.attention_bias
154
+ )
155
+ self.k_proj = nn.Linear(
156
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
157
+ )
158
+ self.v_proj = nn.Linear(
159
+ config.hidden_size, config.num_key_value_heads * self.head_dim, bias=config.attention_bias
160
+ )
161
+ self.o_proj = nn.Linear(
162
+ config.num_attention_heads * self.head_dim, config.hidden_size, bias=config.attention_bias
163
+ )
164
+
165
+ def forward(
166
+ self,
167
+ hidden_states: torch.Tensor,
168
+ position_embeddings: Tuple[torch.Tensor, torch.Tensor],
169
+ attention_mask: Optional[torch.Tensor],
170
+ **kwargs: Unpack[FlashAttentionKwargs],
171
+ ) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
172
+ input_shape = hidden_states.shape[:-1]
173
+ hidden_shape = (*input_shape, -1, self.head_dim)
174
+
175
+ query_states = self.q_proj(hidden_states).view(hidden_shape).transpose(1, 2)
176
+ key_states = self.k_proj(hidden_states).view(hidden_shape).transpose(1, 2)
177
+ value_states = self.v_proj(hidden_states).view(hidden_shape).transpose(1, 2)
178
+
179
+ cos, sin = position_embeddings
180
+ query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
181
+
182
+ attention_interface: Callable = eager_attention_forward
183
+ if self.config._attn_implementation != "eager":
184
+ if self.config._attn_implementation == "sdpa" and kwargs.get("output_attentions", False):
185
+ logger.warning_once(
186
+ "`torch.nn.functional.scaled_dot_product_attention` does not support `output_attentions=True`. Falling back to "
187
+ 'eager attention. This warning can be removed using the argument `attn_implementation="eager"` when loading the model.'
188
+ )
189
+ else:
190
+ attention_interface = ALL_ATTENTION_FUNCTIONS[self.config._attn_implementation]
191
+
192
+ attn_output, attn_weights = attention_interface(
193
+ self,
194
+ query_states,
195
+ key_states,
196
+ value_states,
197
+ attention_mask,
198
+ dropout=0.0 if not self.training else self.attention_dropout,
199
+ scaling=self.scaling,
200
+ is_causal=False,
201
+ **kwargs,
202
+ )
203
+
204
+ attn_output = attn_output.reshape(*input_shape, -1).contiguous()
205
+ attn_output = self.o_proj(attn_output)
206
+ return attn_output, attn_weights
207
+
208
+
209
+ EUROBERT_START_DOCSTRING = r"""
210
+ This model inherits from [`PreTrainedModel`]. Check the superclass documentation for the generic methods the
211
+ library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
212
+ etc.)
213
+
214
+ This model is also a PyTorch [torch.nn.Module](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) subclass.
215
+ Use it as a regular PyTorch Module and refer to the PyTorch documentation for all matter related to general usage
216
+ and behavior.
217
+
218
+ Parameters:
219
+ config ([`EuroBertConfig`]):
220
+ Model configuration class with all the parameters of the model. Initializing with a config file does not
221
+ load the weights associated with the model, only the configuration. Check out the
222
+ [`~PreTrainedModel.from_pretrained`] method to load the model weights.
223
+ """
224
+
225
+
226
+ @add_start_docstrings(
227
+ "The bare EuroBERT Model outputting raw hidden-states without any specific head on top.",
228
+ EUROBERT_START_DOCSTRING,
229
+ )
230
+ class EuroBertPreTrainedModel(PreTrainedModel):
231
+ config_class = EuroBertConfig
232
+ base_model_prefix = "model"
233
+ supports_gradient_checkpointing = True
234
+ _no_split_modules = ["EuroBertDecoderLayer"]
235
+ _skip_keys_device_placement = ["past_key_values"]
236
+ _supports_flash_attn_2 = True
237
+ _supports_sdpa = True
238
+ _supports_flex_attn = True
239
+ _supports_cache_class = True
240
+ _supports_quantized_cache = True
241
+ _supports_static_cache = True
242
+ _supports_attention_backend = True
243
+
244
+ def _init_weights(self, module):
245
+ std = self.config.initializer_range
246
+ if isinstance(module, nn.Linear):
247
+ module.weight.data.normal_(mean=0.0, std=std)
248
+ if module.bias is not None:
249
+ module.bias.data.zero_()
250
+ elif isinstance(module, nn.Embedding):
251
+ module.weight.data.normal_(mean=0.0, std=std)
252
+ if module.padding_idx is not None:
253
+ module.weight.data[module.padding_idx].zero_()
254
+
255
+
256
+ class EuroBertRotaryEmbedding(nn.Module):
257
+ def __init__(self, config: EuroBertConfig, device=None):
258
+ super().__init__()
259
+ # BC: "rope_type" was originally "type"
260
+ if hasattr(config, "rope_scaling") and config.rope_scaling is not None:
261
+ self.rope_type = config.rope_scaling.get("rope_type", config.rope_scaling.get("type"))
262
+ else:
263
+ self.rope_type = "default"
264
+ self.max_seq_len_cached = config.max_position_embeddings
265
+ self.original_max_seq_len = config.max_position_embeddings
266
+
267
+ self.config = config
268
+ self.rope_init_fn = ROPE_INIT_FUNCTIONS[self.rope_type]
269
+
270
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device)
271
+ self.register_buffer("inv_freq", inv_freq, persistent=False)
272
+ self.original_inv_freq = self.inv_freq
273
+
274
+ def _dynamic_frequency_update(self, position_ids, device):
275
+ """
276
+ dynamic RoPE layers should recompute `inv_freq` in the following situations:
277
+ 1 - growing beyond the cached sequence length (allow scaling)
278
+ 2 - the current sequence length is in the original scale (avoid losing precision with small sequences)
279
+ """
280
+ seq_len = torch.max(position_ids) + 1
281
+ if seq_len > self.max_seq_len_cached: # growth
282
+ inv_freq, self.attention_scaling = self.rope_init_fn(self.config, device, seq_len=seq_len)
283
+ self.register_buffer("inv_freq", inv_freq, persistent=False) # TODO joao: may break with compilation
284
+ self.max_seq_len_cached = seq_len
285
+
286
+ if seq_len < self.original_max_seq_len and self.max_seq_len_cached > self.original_max_seq_len: # reset
287
+ # This .to() is needed if the model has been moved to a device after being initialized (because
288
+ # the buffer is automatically moved, but not the original copy)
289
+ self.original_inv_freq = self.original_inv_freq.to(device)
290
+ self.register_buffer("inv_freq", self.original_inv_freq, persistent=False)
291
+ self.max_seq_len_cached = self.original_max_seq_len
292
+
293
+ @torch.no_grad()
294
+ def forward(self, x, position_ids):
295
+ if "dynamic" in self.rope_type:
296
+ self._dynamic_frequency_update(position_ids, device=x.device)
297
+
298
+ # Core RoPE block
299
+ inv_freq_expanded = self.inv_freq[None, :, None].float().expand(position_ids.shape[0], -1, 1)
300
+ position_ids_expanded = position_ids[:, None, :].float()
301
+ # Force float32 (see https://github.com/huggingface/transformers/pull/29285)
302
+ device_type = x.device.type
303
+ device_type = device_type if isinstance(device_type, str) and device_type != "mps" else "cpu"
304
+ with torch.autocast(device_type=device_type, enabled=False):
305
+ freqs = (inv_freq_expanded.float() @ position_ids_expanded.float()).transpose(1, 2)
306
+ emb = torch.cat((freqs, freqs), dim=-1)
307
+ cos = emb.cos()
308
+ sin = emb.sin()
309
+
310
+ # Advanced RoPE types (e.g. yarn) apply a post-processing scaling factor, equivalent to scaling attention
311
+ cos = cos * self.attention_scaling
312
+ sin = sin * self.attention_scaling
313
+
314
+ return cos.to(dtype=x.dtype), sin.to(dtype=x.dtype)
315
+
316
+
317
+ class EuroBertMLP(nn.Module):
318
+ def __init__(self, config):
319
+ super().__init__()
320
+ self.config = config
321
+ self.hidden_size = config.hidden_size
322
+ self.intermediate_size = config.intermediate_size
323
+ self.gate_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
324
+ self.up_proj = nn.Linear(self.hidden_size, self.intermediate_size, bias=config.mlp_bias)
325
+ self.down_proj = nn.Linear(self.intermediate_size, self.hidden_size, bias=config.mlp_bias)
326
+ self.act_fn = ACT2FN[config.hidden_act]
327
+
328
+ def forward(self, x):
329
+ down_proj = self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
330
+ return down_proj
331
+
332
+
333
+ class EuroBertDecoderLayer(nn.Module):
334
+ def __init__(self, config: EuroBertConfig, layer_idx: int):
335
+ super().__init__()
336
+ self.hidden_size = config.hidden_size
337
+
338
+ self.self_attn = EuroBertAttention(config=config, layer_idx=layer_idx)
339
+
340
+ self.mlp = EuroBertMLP(config)
341
+ self.input_layernorm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
342
+ self.post_attention_layernorm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
343
+
344
+ def forward(
345
+ self,
346
+ hidden_states: torch.Tensor,
347
+ attention_mask: Optional[torch.Tensor] = None,
348
+ position_ids: Optional[torch.LongTensor] = None,
349
+ past_key_value: Optional[Cache] = None,
350
+ output_attentions: Optional[bool] = False,
351
+ use_cache: Optional[bool] = False,
352
+ cache_position: Optional[torch.LongTensor] = None,
353
+ position_embeddings: Optional[Tuple[torch.Tensor, torch.Tensor]] = None, # necessary, but kept here for BC
354
+ **kwargs: Unpack[FlashAttentionKwargs],
355
+ ) -> Tuple[torch.FloatTensor, Optional[Tuple[torch.FloatTensor, torch.FloatTensor]]]:
356
+ residual = hidden_states
357
+
358
+ hidden_states = self.input_layernorm(hidden_states)
359
+
360
+ # Self Attention
361
+ hidden_states, self_attn_weights = self.self_attn(
362
+ hidden_states=hidden_states,
363
+ attention_mask=attention_mask,
364
+ position_ids=position_ids,
365
+ past_key_value=past_key_value,
366
+ output_attentions=output_attentions,
367
+ use_cache=use_cache,
368
+ cache_position=cache_position,
369
+ position_embeddings=position_embeddings,
370
+ **kwargs,
371
+ )
372
+ hidden_states = residual + hidden_states
373
+
374
+ # Fully Connected
375
+ residual = hidden_states
376
+ hidden_states = self.post_attention_layernorm(hidden_states)
377
+ hidden_states = self.mlp(hidden_states)
378
+ hidden_states = residual + hidden_states
379
+
380
+ outputs = (hidden_states,)
381
+ if output_attentions:
382
+ outputs += (self_attn_weights,)
383
+
384
+ return outputs
385
+
386
+
387
+ EUROBERT_INPUTS_DOCSTRING = r"""
388
+ Args:
389
+ input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
390
+ Indices of input sequence tokens in the vocabulary. Padding will be ignored by default should you provide
391
+ it.
392
+
393
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
394
+ [`PreTrainedTokenizer.__call__`] for details.
395
+
396
+ [What are input IDs?](../glossary#input-ids)
397
+ attention_mask (`torch.Tensor` of shape `(batch_size, sequence_length)`, *optional*):
398
+ Mask to avoid performing attention on padding token indices. Mask values selected in `[0, 1]`:
399
+
400
+ - 1 for tokens that are **not masked**,
401
+ - 0 for tokens that are **masked**.
402
+
403
+ [What are attention masks?](../glossary#attention-mask)
404
+
405
+ Indices can be obtained using [`AutoTokenizer`]. See [`PreTrainedTokenizer.encode`] and
406
+ [`PreTrainedTokenizer.__call__`] for details.
407
+
408
+ If `past_key_values` is used, optionally only the last `input_ids` have to be input (see
409
+ `past_key_values`).
410
+
411
+ If you want to change padding behavior, you should read [`modeling_opt._prepare_decoder_attention_mask`]
412
+ and modify to your needs. See diagram 1 in [the paper](https://arxiv.org/abs/1910.13461) for more
413
+ information on the default strategy.
414
+
415
+ - 1 indicates the head is **not masked**,
416
+ - 0 indicates the head is **masked**.
417
+ position_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`, *optional*):
418
+ Indices of positions of each input sequence tokens in the position embeddings. Selected in the range `[0,
419
+ config.n_positions - 1]`.
420
+
421
+ [What are position IDs?](../glossary#position-ids)
422
+ past_key_values (`Cache` or `tuple(tuple(torch.FloatTensor))`, *optional*):
423
+ Pre-computed hidden-states (key and values in the self-attention blocks and in the cross-attention
424
+ blocks) that can be used to speed up sequential decoding. This typically consists in the `past_key_values`
425
+ returned by the model at a previous stage of decoding, when `use_cache=True` or `config.use_cache=True`.
426
+
427
+ Two formats are allowed:
428
+ - a [`~cache_utils.Cache`] instance, see our
429
+ [kv cache guide](https://huggingface.co/docs/transformers/en/kv_cache);
430
+ - Tuple of `tuple(torch.FloatTensor)` of length `config.n_layers`, with each tuple having 2 tensors of
431
+ shape `(batch_size, num_heads, sequence_length, embed_size_per_head)`). This is also known as the legacy
432
+ cache format.
433
+
434
+ The model will output the same cache format that is fed as input. If no `past_key_values` are passed, the
435
+ legacy cache format will be returned.
436
+
437
+ If `past_key_values` are used, the user can optionally input only the last `input_ids` (those that don't
438
+ have their past key value states given to this model) of shape `(batch_size, 1)` instead of all `input_ids`
439
+ of shape `(batch_size, sequence_length)`.
440
+ inputs_embeds (`torch.FloatTensor` of shape `(batch_size, sequence_length, hidden_size)`, *optional*):
441
+ Optionally, instead of passing `input_ids` you can choose to directly pass an embedded representation. This
442
+ is useful if you want more control over how to convert `input_ids` indices into associated vectors than the
443
+ model's internal embedding lookup matrix.
444
+ use_cache (`bool`, *optional*):
445
+ If set to `True`, `past_key_values` key value states are returned and can be used to speed up decoding (see
446
+ `past_key_values`).
447
+ output_attentions (`bool`, *optional*):
448
+ Whether or not to return the attentions tensors of all attention layers. See `attentions` under returned
449
+ tensors for more detail.
450
+ output_hidden_states (`bool`, *optional*):
451
+ Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
452
+ more detail.
453
+ return_dict (`bool`, *optional*):
454
+ Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.
455
+ cache_position (`torch.LongTensor` of shape `(sequence_length)`, *optional*):
456
+ Indices depicting the position of the input sequence tokens in the sequence. Contrarily to `position_ids`,
457
+ this tensor is not affected by padding. It is used to update the cache in the correct position and to infer
458
+ the complete sequence length.
459
+ """
460
+
461
+
462
+ @add_start_docstrings(
463
+ "The bare EuroBert Model outputting raw hidden-states without any specific head on top.",
464
+ EUROBERT_START_DOCSTRING,
465
+ )
466
+ class EuroBertModel(EuroBertPreTrainedModel):
467
+ """
468
+ Transformer encoder consisting of *config.num_hidden_layers* layers. Each layer is a [`EuroBertDecoderLayer`]
469
+
470
+ Args:
471
+ config: EuroBertConfig
472
+ """
473
+
474
+ def __init__(self, config: EuroBertConfig):
475
+ super().__init__(config)
476
+ self.padding_idx = config.pad_token_id
477
+ self.vocab_size = config.vocab_size
478
+
479
+ self.embed_tokens = nn.Embedding(config.vocab_size, config.hidden_size, self.padding_idx)
480
+ self.layers = nn.ModuleList(
481
+ [EuroBertDecoderLayer(config, layer_idx) for layer_idx in range(config.num_hidden_layers)]
482
+ )
483
+ self.norm = EuroBertRMSNorm(config.hidden_size, eps=config.rms_norm_eps)
484
+ self.rotary_emb = EuroBertRotaryEmbedding(config=config)
485
+ self.gradient_checkpointing = False
486
+ self.mask_converter = AttentionMaskConverter(is_causal=False)
487
+
488
+ # Initialize weights and apply final processing
489
+ self.post_init()
490
+
491
+ def get_input_embeddings(self):
492
+ return self.embed_tokens
493
+
494
+ def set_input_embeddings(self, value):
495
+ self.embed_tokens = value
496
+
497
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
498
+ @add_code_sample_docstrings(
499
+ checkpoint=_CHECKPOINT_FOR_DOC,
500
+ output_type=BaseModelOutput,
501
+ config_class=_CONFIG_FOR_DOC,
502
+ )
503
+ def forward(
504
+ self,
505
+ input_ids: torch.LongTensor = None,
506
+ attention_mask: Optional[torch.Tensor] = None,
507
+ position_ids: Optional[torch.LongTensor] = None,
508
+ inputs_embeds: Optional[torch.FloatTensor] = None,
509
+ output_attentions: Optional[bool] = None,
510
+ output_hidden_states: Optional[bool] = None,
511
+ return_dict: Optional[bool] = None,
512
+ **flash_attn_kwargs: Unpack[FlashAttentionKwargs],
513
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
514
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
515
+ output_hidden_states = (
516
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
517
+ )
518
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
519
+
520
+ if (input_ids is None) ^ (inputs_embeds is not None):
521
+ raise ValueError("You must specify exactly one of input_ids or inputs_embeds")
522
+
523
+ if inputs_embeds is None:
524
+ inputs_embeds = self.embed_tokens(input_ids)
525
+
526
+ if attention_mask is not None and self.config._attn_implementation != "flash_attention_2":
527
+ mask = self.mask_converter.to_4d(attention_mask, attention_mask.shape[1], inputs_embeds.dtype)
528
+ else:
529
+ mask = attention_mask
530
+
531
+ hidden_states = inputs_embeds
532
+
533
+ # create position embeddings to be shared across the encoder layers
534
+ if position_ids is None:
535
+ position_ids = torch.arange(inputs_embeds.shape[1], device=inputs_embeds.device).unsqueeze(0)
536
+ position_embeddings = self.rotary_emb(hidden_states, position_ids)
537
+
538
+ # encoder layers
539
+ all_hidden_states = () if output_hidden_states else None
540
+ all_self_attns = () if output_attentions else None
541
+
542
+ for encoder_layer in self.layers[: self.config.num_hidden_layers]:
543
+ if output_hidden_states:
544
+ all_hidden_states += (hidden_states,)
545
+
546
+ if self.gradient_checkpointing and self.training:
547
+ layer_outputs = self._gradient_checkpointing_func(
548
+ encoder_layer.__call__,
549
+ hidden_states,
550
+ mask,
551
+ position_ids,
552
+ None,
553
+ output_attentions,
554
+ False,
555
+ None,
556
+ position_embeddings,
557
+ )
558
+ else:
559
+ layer_outputs = encoder_layer(
560
+ hidden_states,
561
+ attention_mask=mask,
562
+ position_ids=position_ids,
563
+ output_attentions=output_attentions,
564
+ position_embeddings=position_embeddings,
565
+ **flash_attn_kwargs,
566
+ )
567
+
568
+ hidden_states = layer_outputs[0]
569
+
570
+ if output_attentions:
571
+ all_self_attns += (layer_outputs[1],)
572
+
573
+ hidden_states = self.norm(hidden_states)
574
+
575
+ # add hidden states from the last encoder layer
576
+ if output_hidden_states:
577
+ all_hidden_states += (hidden_states,)
578
+
579
+ output = BaseModelOutput(
580
+ last_hidden_state=hidden_states,
581
+ hidden_states=all_hidden_states,
582
+ attentions=all_self_attns,
583
+ )
584
+ return output if return_dict else output.to_tuple()
585
+
586
+ def _update_causal_mask(
587
+ self,
588
+ attention_mask: torch.Tensor,
589
+ input_tensor: torch.Tensor,
590
+ cache_position: torch.Tensor,
591
+ past_key_values: Cache,
592
+ output_attentions: bool,
593
+ ):
594
+ if self.config._attn_implementation == "flash_attention_2":
595
+ if attention_mask is not None and (attention_mask == 0.0).any():
596
+ return attention_mask
597
+ return None
598
+
599
+ # For SDPA, when possible, we will rely on its `is_causal` argument instead of its `attn_mask` argument, in
600
+ # order to dispatch on Flash Attention 2. This feature is not compatible with static cache, as SDPA will fail
601
+ # to infer the attention mask.
602
+ past_seen_tokens = past_key_values.get_seq_length() if past_key_values is not None else 0
603
+ using_static_cache = isinstance(past_key_values, StaticCache)
604
+
605
+ # When output attentions is True, sdpa implementation's forward method calls the eager implementation's forward
606
+ if self.config._attn_implementation == "sdpa" and not using_static_cache and not output_attentions:
607
+ if AttentionMaskConverter._ignore_causal_mask_sdpa(
608
+ attention_mask,
609
+ inputs_embeds=input_tensor,
610
+ past_key_values_length=past_seen_tokens,
611
+ is_training=self.training,
612
+ ):
613
+ return None
614
+
615
+ dtype, device = input_tensor.dtype, input_tensor.device
616
+ sequence_length = input_tensor.shape[1]
617
+ if using_static_cache:
618
+ target_length = past_key_values.get_max_cache_shape()
619
+ else:
620
+ target_length = (
621
+ attention_mask.shape[-1]
622
+ if isinstance(attention_mask, torch.Tensor)
623
+ else past_seen_tokens + sequence_length + 1
624
+ )
625
+
626
+ # In case the provided `attention` mask is 2D, we generate a causal mask here (4D).
627
+ causal_mask = self._prepare_4d_causal_attention_mask_with_cache_position(
628
+ attention_mask,
629
+ sequence_length=sequence_length,
630
+ target_length=target_length,
631
+ dtype=dtype,
632
+ device=device,
633
+ cache_position=cache_position,
634
+ batch_size=input_tensor.shape[0],
635
+ )
636
+
637
+ if (
638
+ self.config._attn_implementation == "sdpa"
639
+ and attention_mask is not None
640
+ and attention_mask.device.type in ["cuda", "xpu"]
641
+ and not output_attentions
642
+ ):
643
+ # Attend to all tokens in fully masked rows in the causal_mask, for example the relevant first rows when
644
+ # using left padding. This is required by F.scaled_dot_product_attention memory-efficient attention path.
645
+ # Details: https://github.com/pytorch/pytorch/issues/110213
646
+ min_dtype = torch.finfo(dtype).min
647
+ causal_mask = AttentionMaskConverter._unmask_unattended(causal_mask, min_dtype)
648
+
649
+ return causal_mask
650
+
651
+ @staticmethod
652
+ def _prepare_4d_causal_attention_mask_with_cache_position(
653
+ attention_mask: torch.Tensor,
654
+ sequence_length: int,
655
+ target_length: int,
656
+ dtype: torch.dtype,
657
+ device: torch.device,
658
+ cache_position: torch.Tensor,
659
+ batch_size: int,
660
+ **kwargs,
661
+ ):
662
+ """
663
+ Creates a causal 4D mask of shape `(batch_size, 1, query_length, key_value_length)` from a 2D mask of shape
664
+ `(batch_size, key_value_length)`, or if the input `attention_mask` is already 4D, do nothing.
665
+
666
+ Args:
667
+ attention_mask (`torch.Tensor`):
668
+ A 2D attention mask of shape `(batch_size, key_value_length)` or a 4D attention mask of shape
669
+ `(batch_size, 1, query_length, key_value_length)`.
670
+ sequence_length (`int`):
671
+ The sequence length being processed.
672
+ target_length (`int`):
673
+ The target length: when generating with static cache, the mask should be as long as the static cache,
674
+ to account for the 0 padding, the part of the cache that is not filled yet.
675
+ dtype (`torch.dtype`):
676
+ The dtype to use for the 4D attention mask.
677
+ device (`torch.device`):
678
+ The device to plcae the 4D attention mask on.
679
+ cache_position (`torch.Tensor`):
680
+ Indices depicting the position of the input sequence tokens in the sequence.
681
+ batch_size (`torch.Tensor`):
682
+ Batch size.
683
+ """
684
+ if attention_mask is not None and attention_mask.dim() == 4:
685
+ # In this case we assume that the mask comes already in inverted form and requires no inversion or slicing.
686
+ causal_mask = attention_mask
687
+ else:
688
+ min_dtype = torch.finfo(dtype).min
689
+ causal_mask = torch.full(
690
+ (sequence_length, target_length), fill_value=min_dtype, dtype=dtype, device=device
691
+ )
692
+ if sequence_length != 1:
693
+ causal_mask = torch.triu(causal_mask, diagonal=1)
694
+ causal_mask *= torch.arange(target_length, device=device) > cache_position.reshape(-1, 1)
695
+ causal_mask = causal_mask[None, None, :, :].expand(batch_size, 1, -1, -1)
696
+ if attention_mask is not None:
697
+ causal_mask = causal_mask.clone() # copy to contiguous memory for in-place edit
698
+ mask_length = attention_mask.shape[-1]
699
+ padding_mask = causal_mask[:, :, :, :mask_length] + attention_mask[:, None, None, :].to(
700
+ causal_mask.device
701
+ )
702
+ padding_mask = padding_mask == 0
703
+ causal_mask[:, :, :, :mask_length] = causal_mask[:, :, :, :mask_length].masked_fill(
704
+ padding_mask, min_dtype
705
+ )
706
+
707
+ return causal_mask
708
+
709
+
710
+ @add_start_docstrings(
711
+ "The EuroBert Model with a decoder head on top that is used for masked language modeling.",
712
+ EUROBERT_START_DOCSTRING,
713
+ )
714
+ class EuroBertForMaskedLM(EuroBertPreTrainedModel):
715
+ def __init__(self, config: EuroBertConfig):
716
+ super().__init__(config)
717
+ self.model = EuroBertModel(config)
718
+ self.lm_head = nn.Linear(config.hidden_size, config.vocab_size, config.mlp_bias)
719
+ self.post_init()
720
+
721
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
722
+ @add_code_sample_docstrings(
723
+ checkpoint=_CHECKPOINT_FOR_DOC,
724
+ output_type=BaseModelOutput,
725
+ config_class=_CONFIG_FOR_DOC,
726
+ )
727
+ def forward(
728
+ self,
729
+ input_ids: Optional[torch.LongTensor] = None,
730
+ attention_mask: Optional[torch.Tensor] = None,
731
+ position_ids: Optional[torch.LongTensor] = None,
732
+ inputs_embeds: Optional[torch.FloatTensor] = None,
733
+ labels: Optional[torch.LongTensor] = None,
734
+ output_attentions: Optional[bool] = None,
735
+ output_hidden_states: Optional[bool] = None,
736
+ return_dict: Optional[bool] = None,
737
+ ) -> Union[Tuple[torch.Tensor], MaskedLMOutput]:
738
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
739
+
740
+ encoder_output = self.model(
741
+ input_ids,
742
+ attention_mask=attention_mask,
743
+ position_ids=position_ids,
744
+ inputs_embeds=inputs_embeds,
745
+ output_attentions=output_attentions,
746
+ output_hidden_states=output_hidden_states,
747
+ return_dict=return_dict,
748
+ )
749
+
750
+ prediction_scores = self.lm_head(encoder_output[0])
751
+ masked_lm_loss = None
752
+ if labels is not None:
753
+ labels = labels.to(prediction_scores.device)
754
+ masked_lm_loss = self.loss_function(prediction_scores, labels, vocab_size=self.config.vocab_size)
755
+
756
+ if not return_dict:
757
+ output = (prediction_scores,) + encoder_output[1:]
758
+ return ((masked_lm_loss,) + output) if masked_lm_loss is not None else output
759
+
760
+ return MaskedLMOutput(
761
+ loss=masked_lm_loss,
762
+ logits=prediction_scores,
763
+ hidden_states=encoder_output.hidden_states,
764
+ attentions=encoder_output.attentions,
765
+ )
766
+
767
+
768
+ @add_start_docstrings(
769
+ "The EuroBert Model with a sequence classification head on top that performs pooling.",
770
+ EUROBERT_START_DOCSTRING,
771
+ )
772
+ class EuroBertForSequenceClassification(EuroBertPreTrainedModel):
773
+ def __init__(self, config: EuroBertConfig):
774
+ super().__init__(config)
775
+ self.num_labels = config.num_labels
776
+ self.clf_pooling = config.clf_pooling
777
+
778
+ self.model = EuroBertModel(config)
779
+ self.dense = nn.Linear(config.hidden_size, config.hidden_size)
780
+ self.activation = nn.GELU()
781
+ self.classifier = nn.Linear(config.hidden_size, self.num_labels)
782
+ self.post_init()
783
+
784
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
785
+ @add_code_sample_docstrings(
786
+ checkpoint=_CHECKPOINT_FOR_DOC,
787
+ output_type=BaseModelOutput,
788
+ config_class=_CONFIG_FOR_DOC,
789
+ )
790
+ def forward(
791
+ self,
792
+ input_ids: Optional[torch.LongTensor] = None,
793
+ attention_mask: Optional[torch.Tensor] = None,
794
+ position_ids: Optional[torch.LongTensor] = None,
795
+ inputs_embeds: Optional[torch.FloatTensor] = None,
796
+ labels: Optional[torch.LongTensor] = None,
797
+ output_attentions: Optional[bool] = None,
798
+ output_hidden_states: Optional[bool] = None,
799
+ return_dict: Optional[bool] = None,
800
+ ) -> Union[Tuple[torch.Tensor], SequenceClassifierOutput]:
801
+ r"""
802
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
803
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
804
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
805
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
806
+ """
807
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
808
+
809
+ encoder_output = self.model(
810
+ input_ids,
811
+ attention_mask=attention_mask,
812
+ position_ids=position_ids,
813
+ inputs_embeds=inputs_embeds,
814
+ output_attentions=output_attentions,
815
+ output_hidden_states=output_hidden_states,
816
+ return_dict=return_dict,
817
+ )
818
+ last_hidden_state = encoder_output[0]
819
+
820
+ if self.clf_pooling in ["bos", "mean"]:
821
+ if self.clf_pooling == "bos":
822
+ pooled_output = last_hidden_state[:, 0]
823
+
824
+ elif self.clf_pooling == "mean":
825
+ if attention_mask is None:
826
+ pooled_output = last_hidden_state.mean(dim=1)
827
+ else:
828
+ pooled_output = (last_hidden_state * attention_mask.unsqueeze(-1)).sum(dim=1)
829
+ pooled_output /= attention_mask.sum(dim=1, keepdim=True)
830
+
831
+ pooled_output = self.dense(pooled_output)
832
+ pooled_output = self.activation(pooled_output)
833
+ logits = self.classifier(pooled_output)
834
+
835
+ elif self.clf_pooling == "late":
836
+ x = self.dense(last_hidden_state)
837
+ x = self.activation(x)
838
+ logits = self.classifier(x)
839
+ if attention_mask is None:
840
+ logits = logits.mean(dim=1)
841
+ else:
842
+ logits = (logits * attention_mask.unsqueeze(-1)).sum(dim=1)
843
+ logits /= attention_mask.sum(dim=1, keepdim=True)
844
+
845
+ loss = None
846
+ if labels is not None:
847
+ labels = labels.to(logits.device)
848
+ if self.config.problem_type is None:
849
+ if self.num_labels == 1:
850
+ self.config.problem_type = "regression"
851
+ elif self.num_labels > 1 and (labels.dtype == torch.long or labels.dtype == torch.int):
852
+ self.config.problem_type = "single_label_classification"
853
+ else:
854
+ self.config.problem_type = "multi_label_classification"
855
+
856
+ if self.config.problem_type == "regression":
857
+ loss_fct = MSELoss()
858
+ if self.num_labels == 1:
859
+ loss = loss_fct(logits.squeeze(), labels.squeeze())
860
+ else:
861
+ loss = loss_fct(logits, labels)
862
+ elif self.config.problem_type == "single_label_classification":
863
+ loss_fct = CrossEntropyLoss()
864
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
865
+ elif self.config.problem_type == "multi_label_classification":
866
+ loss_fct = BCEWithLogitsLoss()
867
+ loss = loss_fct(logits, labels)
868
+
869
+ if not return_dict:
870
+ output = (logits,) + encoder_output[1:]
871
+ return ((loss,) + output) if loss is not None else output
872
+
873
+ return SequenceClassifierOutput(
874
+ loss=loss,
875
+ logits=logits,
876
+ hidden_states=encoder_output.hidden_states,
877
+ attentions=encoder_output.attentions,
878
+ )
879
+
880
+
881
+ @add_start_docstrings(
882
+ """
883
+ The EuroBert Model with a token classification head on top (a linear layer on top of the hidden-states
884
+ output) e.g. for Named-Entity-Recognition (NER) tasks."
885
+ """,
886
+ EUROBERT_START_DOCSTRING,
887
+ )
888
+ class EuroBertForTokenClassification(EuroBertPreTrainedModel):
889
+ def __init__(self, config: EuroBertConfig):
890
+ super().__init__(config)
891
+ self.num_labels = config.num_labels
892
+ self.model = EuroBertModel(config)
893
+
894
+ self.classifier = nn.Linear(config.hidden_size, config.num_labels)
895
+ self.post_init()
896
+
897
+ def get_input_embeddings(self):
898
+ return self.model.embed_tokens
899
+
900
+ def set_input_embeddings(self, value):
901
+ self.model.embed_tokens = value
902
+
903
+ @add_start_docstrings_to_model_forward(EUROBERT_INPUTS_DOCSTRING)
904
+ def forward(
905
+ self,
906
+ input_ids: Optional[torch.LongTensor] = None,
907
+ attention_mask: Optional[torch.Tensor] = None,
908
+ position_ids: Optional[torch.LongTensor] = None,
909
+ inputs_embeds: Optional[torch.FloatTensor] = None,
910
+ labels: Optional[torch.LongTensor] = None,
911
+ use_cache: Optional[bool] = None,
912
+ output_attentions: Optional[bool] = None,
913
+ output_hidden_states: Optional[bool] = None,
914
+ return_dict: Optional[bool] = None,
915
+ ) -> Union[Tuple, TokenClassifierOutput]:
916
+ r"""
917
+ labels (`torch.LongTensor` of shape `(batch_size,)`, *optional*):
918
+ Labels for computing the sequence classification/regression loss. Indices should be in `[0, ...,
919
+ config.num_labels - 1]`. If `config.num_labels == 1` a regression loss is computed (Mean-Square loss), If
920
+ `config.num_labels > 1` a classification loss is computed (Cross-Entropy).
921
+ """
922
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
923
+
924
+ outputs = self.model(
925
+ input_ids,
926
+ attention_mask=attention_mask,
927
+ position_ids=position_ids,
928
+ inputs_embeds=inputs_embeds,
929
+ use_cache=use_cache,
930
+ output_attentions=output_attentions,
931
+ output_hidden_states=output_hidden_states,
932
+ return_dict=return_dict,
933
+ )
934
+ sequence_output = outputs[0]
935
+ logits = self.classifier(sequence_output)
936
+
937
+ loss = None
938
+ if labels is not None:
939
+ loss_fct = CrossEntropyLoss()
940
+ loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
941
+
942
+ if not return_dict:
943
+ output = (logits,) + outputs[2:]
944
+ return ((loss,) + output) if loss is not None else output
945
+
946
+ return TokenClassifierOutput(
947
+ loss=loss,
948
+ logits=logits,
949
+ hidden_states=outputs.hidden_states,
950
+ attentions=outputs.attentions,
951
+ )
952
+
953
+
954
+ __all__ = [
955
+ "EuroBertPreTrainedModel",
956
+ "EuroBertModel",
957
+ "EuroBertForMaskedLM",
958
+ "EuroBertForSequenceClassification",
959
+ "EuroBertForTokenClassification",
960
+ ]
openmed_vs_sota_grouped_bars.png ADDED

Git LFS Details

  • SHA256: 626b37d9b20c44e26c92a8b5bf774107393ae0ad0b482d8e7cb3dc31d960f611
  • Pointer size: 131 Bytes
  • Size of remote file: 497 kB
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<|begin_of_text|>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "<|end_of_text|>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "mask_token": {
17
+ "content": "<|mask|>",
18
+ "lstrip": true,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "<|pad|>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
test_results.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "eval_accuracy": 0.995875073659399,
3
+ "eval_f1": 0.990410516108178,
4
+ "eval_loss": 0.3030967712402344,
5
+ "eval_precision": 0.9994780793319415,
6
+ "eval_recall": 0.9815060009396489
7
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a5ced525276e4f0b096912a287a1962dbcc14e8addd12b1c89f03a52ef0cbb14
3
+ size 17210345
tokenizer_config.json ADDED
@@ -0,0 +1,2068 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "128000": {
4
+ "content": "<|begin_of_text|>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "128001": {
12
+ "content": "<|end_of_text|>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "128002": {
20
+ "content": "<|mask|>",
21
+ "lstrip": true,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "128003": {
28
+ "content": "<|parallel_sep|>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128004": {
36
+ "content": "<|pad|>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ },
43
+ "128005": {
44
+ "content": "<|reserved_special_token_2|>",
45
+ "lstrip": false,
46
+ "normalized": false,
47
+ "rstrip": false,
48
+ "single_word": false,
49
+ "special": true
50
+ },
51
+ "128006": {
52
+ "content": "<|start_header_id|>",
53
+ "lstrip": false,
54
+ "normalized": false,
55
+ "rstrip": false,
56
+ "single_word": false,
57
+ "special": true
58
+ },
59
+ "128007": {
60
+ "content": "<|end_header_id|>",
61
+ "lstrip": false,
62
+ "normalized": false,
63
+ "rstrip": false,
64
+ "single_word": false,
65
+ "special": true
66
+ },
67
+ "128008": {
68
+ "content": "<|eom_id|>",
69
+ "lstrip": false,
70
+ "normalized": false,
71
+ "rstrip": false,
72
+ "single_word": false,
73
+ "special": true
74
+ },
75
+ "128009": {
76
+ "content": "<|eot_id|>",
77
+ "lstrip": false,
78
+ "normalized": false,
79
+ "rstrip": false,
80
+ "single_word": false,
81
+ "special": true
82
+ },
83
+ "128010": {
84
+ "content": "<|python_tag|>",
85
+ "lstrip": false,
86
+ "normalized": false,
87
+ "rstrip": false,
88
+ "single_word": false,
89
+ "special": true
90
+ },
91
+ "128011": {
92
+ "content": "<|reserved_special_token_3|>",
93
+ "lstrip": false,
94
+ "normalized": false,
95
+ "rstrip": false,
96
+ "single_word": false,
97
+ "special": true
98
+ },
99
+ "128012": {
100
+ "content": "<|reserved_special_token_4|>",
101
+ "lstrip": false,
102
+ "normalized": false,
103
+ "rstrip": false,
104
+ "single_word": false,
105
+ "special": true
106
+ },
107
+ "128013": {
108
+ "content": "<|reserved_special_token_5|>",
109
+ "lstrip": false,
110
+ "normalized": false,
111
+ "rstrip": false,
112
+ "single_word": false,
113
+ "special": true
114
+ },
115
+ "128014": {
116
+ "content": "<|reserved_special_token_6|>",
117
+ "lstrip": false,
118
+ "normalized": false,
119
+ "rstrip": false,
120
+ "single_word": false,
121
+ "special": true
122
+ },
123
+ "128015": {
124
+ "content": "<|reserved_special_token_7|>",
125
+ "lstrip": false,
126
+ "normalized": false,
127
+ "rstrip": false,
128
+ "single_word": false,
129
+ "special": true
130
+ },
131
+ "128016": {
132
+ "content": "<|reserved_special_token_8|>",
133
+ "lstrip": false,
134
+ "normalized": false,
135
+ "rstrip": false,
136
+ "single_word": false,
137
+ "special": true
138
+ },
139
+ "128017": {
140
+ "content": "<|reserved_special_token_9|>",
141
+ "lstrip": false,
142
+ "normalized": false,
143
+ "rstrip": false,
144
+ "single_word": false,
145
+ "special": true
146
+ },
147
+ "128018": {
148
+ "content": "<|reserved_special_token_10|>",
149
+ "lstrip": false,
150
+ "normalized": false,
151
+ "rstrip": false,
152
+ "single_word": false,
153
+ "special": true
154
+ },
155
+ "128019": {
156
+ "content": "<|reserved_special_token_11|>",
157
+ "lstrip": false,
158
+ "normalized": false,
159
+ "rstrip": false,
160
+ "single_word": false,
161
+ "special": true
162
+ },
163
+ "128020": {
164
+ "content": "<|reserved_special_token_12|>",
165
+ "lstrip": false,
166
+ "normalized": false,
167
+ "rstrip": false,
168
+ "single_word": false,
169
+ "special": true
170
+ },
171
+ "128021": {
172
+ "content": "<|reserved_special_token_13|>",
173
+ "lstrip": false,
174
+ "normalized": false,
175
+ "rstrip": false,
176
+ "single_word": false,
177
+ "special": true
178
+ },
179
+ "128022": {
180
+ "content": "<|reserved_special_token_14|>",
181
+ "lstrip": false,
182
+ "normalized": false,
183
+ "rstrip": false,
184
+ "single_word": false,
185
+ "special": true
186
+ },
187
+ "128023": {
188
+ "content": "<|reserved_special_token_15|>",
189
+ "lstrip": false,
190
+ "normalized": false,
191
+ "rstrip": false,
192
+ "single_word": false,
193
+ "special": true
194
+ },
195
+ "128024": {
196
+ "content": "<|reserved_special_token_16|>",
197
+ "lstrip": false,
198
+ "normalized": false,
199
+ "rstrip": false,
200
+ "single_word": false,
201
+ "special": true
202
+ },
203
+ "128025": {
204
+ "content": "<|reserved_special_token_17|>",
205
+ "lstrip": false,
206
+ "normalized": false,
207
+ "rstrip": false,
208
+ "single_word": false,
209
+ "special": true
210
+ },
211
+ "128026": {
212
+ "content": "<|reserved_special_token_18|>",
213
+ "lstrip": false,
214
+ "normalized": false,
215
+ "rstrip": false,
216
+ "single_word": false,
217
+ "special": true
218
+ },
219
+ "128027": {
220
+ "content": "<|reserved_special_token_19|>",
221
+ "lstrip": false,
222
+ "normalized": false,
223
+ "rstrip": false,
224
+ "single_word": false,
225
+ "special": true
226
+ },
227
+ "128028": {
228
+ "content": "<|reserved_special_token_20|>",
229
+ "lstrip": false,
230
+ "normalized": false,
231
+ "rstrip": false,
232
+ "single_word": false,
233
+ "special": true
234
+ },
235
+ "128029": {
236
+ "content": "<|reserved_special_token_21|>",
237
+ "lstrip": false,
238
+ "normalized": false,
239
+ "rstrip": false,
240
+ "single_word": false,
241
+ "special": true
242
+ },
243
+ "128030": {
244
+ "content": "<|reserved_special_token_22|>",
245
+ "lstrip": false,
246
+ "normalized": false,
247
+ "rstrip": false,
248
+ "single_word": false,
249
+ "special": true
250
+ },
251
+ "128031": {
252
+ "content": "<|reserved_special_token_23|>",
253
+ "lstrip": false,
254
+ "normalized": false,
255
+ "rstrip": false,
256
+ "single_word": false,
257
+ "special": true
258
+ },
259
+ "128032": {
260
+ "content": "<|reserved_special_token_24|>",
261
+ "lstrip": false,
262
+ "normalized": false,
263
+ "rstrip": false,
264
+ "single_word": false,
265
+ "special": true
266
+ },
267
+ "128033": {
268
+ "content": "<|reserved_special_token_25|>",
269
+ "lstrip": false,
270
+ "normalized": false,
271
+ "rstrip": false,
272
+ "single_word": false,
273
+ "special": true
274
+ },
275
+ "128034": {
276
+ "content": "<|reserved_special_token_26|>",
277
+ "lstrip": false,
278
+ "normalized": false,
279
+ "rstrip": false,
280
+ "single_word": false,
281
+ "special": true
282
+ },
283
+ "128035": {
284
+ "content": "<|reserved_special_token_27|>",
285
+ "lstrip": false,
286
+ "normalized": false,
287
+ "rstrip": false,
288
+ "single_word": false,
289
+ "special": true
290
+ },
291
+ "128036": {
292
+ "content": "<|reserved_special_token_28|>",
293
+ "lstrip": false,
294
+ "normalized": false,
295
+ "rstrip": false,
296
+ "single_word": false,
297
+ "special": true
298
+ },
299
+ "128037": {
300
+ "content": "<|reserved_special_token_29|>",
301
+ "lstrip": false,
302
+ "normalized": false,
303
+ "rstrip": false,
304
+ "single_word": false,
305
+ "special": true
306
+ },
307
+ "128038": {
308
+ "content": "<|reserved_special_token_30|>",
309
+ "lstrip": false,
310
+ "normalized": false,
311
+ "rstrip": false,
312
+ "single_word": false,
313
+ "special": true
314
+ },
315
+ "128039": {
316
+ "content": "<|reserved_special_token_31|>",
317
+ "lstrip": false,
318
+ "normalized": false,
319
+ "rstrip": false,
320
+ "single_word": false,
321
+ "special": true
322
+ },
323
+ "128040": {
324
+ "content": "<|reserved_special_token_32|>",
325
+ "lstrip": false,
326
+ "normalized": false,
327
+ "rstrip": false,
328
+ "single_word": false,
329
+ "special": true
330
+ },
331
+ "128041": {
332
+ "content": "<|reserved_special_token_33|>",
333
+ "lstrip": false,
334
+ "normalized": false,
335
+ "rstrip": false,
336
+ "single_word": false,
337
+ "special": true
338
+ },
339
+ "128042": {
340
+ "content": "<|reserved_special_token_34|>",
341
+ "lstrip": false,
342
+ "normalized": false,
343
+ "rstrip": false,
344
+ "single_word": false,
345
+ "special": true
346
+ },
347
+ "128043": {
348
+ "content": "<|reserved_special_token_35|>",
349
+ "lstrip": false,
350
+ "normalized": false,
351
+ "rstrip": false,
352
+ "single_word": false,
353
+ "special": true
354
+ },
355
+ "128044": {
356
+ "content": "<|reserved_special_token_36|>",
357
+ "lstrip": false,
358
+ "normalized": false,
359
+ "rstrip": false,
360
+ "single_word": false,
361
+ "special": true
362
+ },
363
+ "128045": {
364
+ "content": "<|reserved_special_token_37|>",
365
+ "lstrip": false,
366
+ "normalized": false,
367
+ "rstrip": false,
368
+ "single_word": false,
369
+ "special": true
370
+ },
371
+ "128046": {
372
+ "content": "<|reserved_special_token_38|>",
373
+ "lstrip": false,
374
+ "normalized": false,
375
+ "rstrip": false,
376
+ "single_word": false,
377
+ "special": true
378
+ },
379
+ "128047": {
380
+ "content": "<|reserved_special_token_39|>",
381
+ "lstrip": false,
382
+ "normalized": false,
383
+ "rstrip": false,
384
+ "single_word": false,
385
+ "special": true
386
+ },
387
+ "128048": {
388
+ "content": "<|reserved_special_token_40|>",
389
+ "lstrip": false,
390
+ "normalized": false,
391
+ "rstrip": false,
392
+ "single_word": false,
393
+ "special": true
394
+ },
395
+ "128049": {
396
+ "content": "<|reserved_special_token_41|>",
397
+ "lstrip": false,
398
+ "normalized": false,
399
+ "rstrip": false,
400
+ "single_word": false,
401
+ "special": true
402
+ },
403
+ "128050": {
404
+ "content": "<|reserved_special_token_42|>",
405
+ "lstrip": false,
406
+ "normalized": false,
407
+ "rstrip": false,
408
+ "single_word": false,
409
+ "special": true
410
+ },
411
+ "128051": {
412
+ "content": "<|reserved_special_token_43|>",
413
+ "lstrip": false,
414
+ "normalized": false,
415
+ "rstrip": false,
416
+ "single_word": false,
417
+ "special": true
418
+ },
419
+ "128052": {
420
+ "content": "<|reserved_special_token_44|>",
421
+ "lstrip": false,
422
+ "normalized": false,
423
+ "rstrip": false,
424
+ "single_word": false,
425
+ "special": true
426
+ },
427
+ "128053": {
428
+ "content": "<|reserved_special_token_45|>",
429
+ "lstrip": false,
430
+ "normalized": false,
431
+ "rstrip": false,
432
+ "single_word": false,
433
+ "special": true
434
+ },
435
+ "128054": {
436
+ "content": "<|reserved_special_token_46|>",
437
+ "lstrip": false,
438
+ "normalized": false,
439
+ "rstrip": false,
440
+ "single_word": false,
441
+ "special": true
442
+ },
443
+ "128055": {
444
+ "content": "<|reserved_special_token_47|>",
445
+ "lstrip": false,
446
+ "normalized": false,
447
+ "rstrip": false,
448
+ "single_word": false,
449
+ "special": true
450
+ },
451
+ "128056": {
452
+ "content": "<|reserved_special_token_48|>",
453
+ "lstrip": false,
454
+ "normalized": false,
455
+ "rstrip": false,
456
+ "single_word": false,
457
+ "special": true
458
+ },
459
+ "128057": {
460
+ "content": "<|reserved_special_token_49|>",
461
+ "lstrip": false,
462
+ "normalized": false,
463
+ "rstrip": false,
464
+ "single_word": false,
465
+ "special": true
466
+ },
467
+ "128058": {
468
+ "content": "<|reserved_special_token_50|>",
469
+ "lstrip": false,
470
+ "normalized": false,
471
+ "rstrip": false,
472
+ "single_word": false,
473
+ "special": true
474
+ },
475
+ "128059": {
476
+ "content": "<|reserved_special_token_51|>",
477
+ "lstrip": false,
478
+ "normalized": false,
479
+ "rstrip": false,
480
+ "single_word": false,
481
+ "special": true
482
+ },
483
+ "128060": {
484
+ "content": "<|reserved_special_token_52|>",
485
+ "lstrip": false,
486
+ "normalized": false,
487
+ "rstrip": false,
488
+ "single_word": false,
489
+ "special": true
490
+ },
491
+ "128061": {
492
+ "content": "<|reserved_special_token_53|>",
493
+ "lstrip": false,
494
+ "normalized": false,
495
+ "rstrip": false,
496
+ "single_word": false,
497
+ "special": true
498
+ },
499
+ "128062": {
500
+ "content": "<|reserved_special_token_54|>",
501
+ "lstrip": false,
502
+ "normalized": false,
503
+ "rstrip": false,
504
+ "single_word": false,
505
+ "special": true
506
+ },
507
+ "128063": {
508
+ "content": "<|reserved_special_token_55|>",
509
+ "lstrip": false,
510
+ "normalized": false,
511
+ "rstrip": false,
512
+ "single_word": false,
513
+ "special": true
514
+ },
515
+ "128064": {
516
+ "content": "<|reserved_special_token_56|>",
517
+ "lstrip": false,
518
+ "normalized": false,
519
+ "rstrip": false,
520
+ "single_word": false,
521
+ "special": true
522
+ },
523
+ "128065": {
524
+ "content": "<|reserved_special_token_57|>",
525
+ "lstrip": false,
526
+ "normalized": false,
527
+ "rstrip": false,
528
+ "single_word": false,
529
+ "special": true
530
+ },
531
+ "128066": {
532
+ "content": "<|reserved_special_token_58|>",
533
+ "lstrip": false,
534
+ "normalized": false,
535
+ "rstrip": false,
536
+ "single_word": false,
537
+ "special": true
538
+ },
539
+ "128067": {
540
+ "content": "<|reserved_special_token_59|>",
541
+ "lstrip": false,
542
+ "normalized": false,
543
+ "rstrip": false,
544
+ "single_word": false,
545
+ "special": true
546
+ },
547
+ "128068": {
548
+ "content": "<|reserved_special_token_60|>",
549
+ "lstrip": false,
550
+ "normalized": false,
551
+ "rstrip": false,
552
+ "single_word": false,
553
+ "special": true
554
+ },
555
+ "128069": {
556
+ "content": "<|reserved_special_token_61|>",
557
+ "lstrip": false,
558
+ "normalized": false,
559
+ "rstrip": false,
560
+ "single_word": false,
561
+ "special": true
562
+ },
563
+ "128070": {
564
+ "content": "<|reserved_special_token_62|>",
565
+ "lstrip": false,
566
+ "normalized": false,
567
+ "rstrip": false,
568
+ "single_word": false,
569
+ "special": true
570
+ },
571
+ "128071": {
572
+ "content": "<|reserved_special_token_63|>",
573
+ "lstrip": false,
574
+ "normalized": false,
575
+ "rstrip": false,
576
+ "single_word": false,
577
+ "special": true
578
+ },
579
+ "128072": {
580
+ "content": "<|reserved_special_token_64|>",
581
+ "lstrip": false,
582
+ "normalized": false,
583
+ "rstrip": false,
584
+ "single_word": false,
585
+ "special": true
586
+ },
587
+ "128073": {
588
+ "content": "<|reserved_special_token_65|>",
589
+ "lstrip": false,
590
+ "normalized": false,
591
+ "rstrip": false,
592
+ "single_word": false,
593
+ "special": true
594
+ },
595
+ "128074": {
596
+ "content": "<|reserved_special_token_66|>",
597
+ "lstrip": false,
598
+ "normalized": false,
599
+ "rstrip": false,
600
+ "single_word": false,
601
+ "special": true
602
+ },
603
+ "128075": {
604
+ "content": "<|reserved_special_token_67|>",
605
+ "lstrip": false,
606
+ "normalized": false,
607
+ "rstrip": false,
608
+ "single_word": false,
609
+ "special": true
610
+ },
611
+ "128076": {
612
+ "content": "<|reserved_special_token_68|>",
613
+ "lstrip": false,
614
+ "normalized": false,
615
+ "rstrip": false,
616
+ "single_word": false,
617
+ "special": true
618
+ },
619
+ "128077": {
620
+ "content": "<|reserved_special_token_69|>",
621
+ "lstrip": false,
622
+ "normalized": false,
623
+ "rstrip": false,
624
+ "single_word": false,
625
+ "special": true
626
+ },
627
+ "128078": {
628
+ "content": "<|reserved_special_token_70|>",
629
+ "lstrip": false,
630
+ "normalized": false,
631
+ "rstrip": false,
632
+ "single_word": false,
633
+ "special": true
634
+ },
635
+ "128079": {
636
+ "content": "<|reserved_special_token_71|>",
637
+ "lstrip": false,
638
+ "normalized": false,
639
+ "rstrip": false,
640
+ "single_word": false,
641
+ "special": true
642
+ },
643
+ "128080": {
644
+ "content": "<|reserved_special_token_72|>",
645
+ "lstrip": false,
646
+ "normalized": false,
647
+ "rstrip": false,
648
+ "single_word": false,
649
+ "special": true
650
+ },
651
+ "128081": {
652
+ "content": "<|reserved_special_token_73|>",
653
+ "lstrip": false,
654
+ "normalized": false,
655
+ "rstrip": false,
656
+ "single_word": false,
657
+ "special": true
658
+ },
659
+ "128082": {
660
+ "content": "<|reserved_special_token_74|>",
661
+ "lstrip": false,
662
+ "normalized": false,
663
+ "rstrip": false,
664
+ "single_word": false,
665
+ "special": true
666
+ },
667
+ "128083": {
668
+ "content": "<|reserved_special_token_75|>",
669
+ "lstrip": false,
670
+ "normalized": false,
671
+ "rstrip": false,
672
+ "single_word": false,
673
+ "special": true
674
+ },
675
+ "128084": {
676
+ "content": "<|reserved_special_token_76|>",
677
+ "lstrip": false,
678
+ "normalized": false,
679
+ "rstrip": false,
680
+ "single_word": false,
681
+ "special": true
682
+ },
683
+ "128085": {
684
+ "content": "<|reserved_special_token_77|>",
685
+ "lstrip": false,
686
+ "normalized": false,
687
+ "rstrip": false,
688
+ "single_word": false,
689
+ "special": true
690
+ },
691
+ "128086": {
692
+ "content": "<|reserved_special_token_78|>",
693
+ "lstrip": false,
694
+ "normalized": false,
695
+ "rstrip": false,
696
+ "single_word": false,
697
+ "special": true
698
+ },
699
+ "128087": {
700
+ "content": "<|reserved_special_token_79|>",
701
+ "lstrip": false,
702
+ "normalized": false,
703
+ "rstrip": false,
704
+ "single_word": false,
705
+ "special": true
706
+ },
707
+ "128088": {
708
+ "content": "<|reserved_special_token_80|>",
709
+ "lstrip": false,
710
+ "normalized": false,
711
+ "rstrip": false,
712
+ "single_word": false,
713
+ "special": true
714
+ },
715
+ "128089": {
716
+ "content": "<|reserved_special_token_81|>",
717
+ "lstrip": false,
718
+ "normalized": false,
719
+ "rstrip": false,
720
+ "single_word": false,
721
+ "special": true
722
+ },
723
+ "128090": {
724
+ "content": "<|reserved_special_token_82|>",
725
+ "lstrip": false,
726
+ "normalized": false,
727
+ "rstrip": false,
728
+ "single_word": false,
729
+ "special": true
730
+ },
731
+ "128091": {
732
+ "content": "<|reserved_special_token_83|>",
733
+ "lstrip": false,
734
+ "normalized": false,
735
+ "rstrip": false,
736
+ "single_word": false,
737
+ "special": true
738
+ },
739
+ "128092": {
740
+ "content": "<|reserved_special_token_84|>",
741
+ "lstrip": false,
742
+ "normalized": false,
743
+ "rstrip": false,
744
+ "single_word": false,
745
+ "special": true
746
+ },
747
+ "128093": {
748
+ "content": "<|reserved_special_token_85|>",
749
+ "lstrip": false,
750
+ "normalized": false,
751
+ "rstrip": false,
752
+ "single_word": false,
753
+ "special": true
754
+ },
755
+ "128094": {
756
+ "content": "<|reserved_special_token_86|>",
757
+ "lstrip": false,
758
+ "normalized": false,
759
+ "rstrip": false,
760
+ "single_word": false,
761
+ "special": true
762
+ },
763
+ "128095": {
764
+ "content": "<|reserved_special_token_87|>",
765
+ "lstrip": false,
766
+ "normalized": false,
767
+ "rstrip": false,
768
+ "single_word": false,
769
+ "special": true
770
+ },
771
+ "128096": {
772
+ "content": "<|reserved_special_token_88|>",
773
+ "lstrip": false,
774
+ "normalized": false,
775
+ "rstrip": false,
776
+ "single_word": false,
777
+ "special": true
778
+ },
779
+ "128097": {
780
+ "content": "<|reserved_special_token_89|>",
781
+ "lstrip": false,
782
+ "normalized": false,
783
+ "rstrip": false,
784
+ "single_word": false,
785
+ "special": true
786
+ },
787
+ "128098": {
788
+ "content": "<|reserved_special_token_90|>",
789
+ "lstrip": false,
790
+ "normalized": false,
791
+ "rstrip": false,
792
+ "single_word": false,
793
+ "special": true
794
+ },
795
+ "128099": {
796
+ "content": "<|reserved_special_token_91|>",
797
+ "lstrip": false,
798
+ "normalized": false,
799
+ "rstrip": false,
800
+ "single_word": false,
801
+ "special": true
802
+ },
803
+ "128100": {
804
+ "content": "<|reserved_special_token_92|>",
805
+ "lstrip": false,
806
+ "normalized": false,
807
+ "rstrip": false,
808
+ "single_word": false,
809
+ "special": true
810
+ },
811
+ "128101": {
812
+ "content": "<|reserved_special_token_93|>",
813
+ "lstrip": false,
814
+ "normalized": false,
815
+ "rstrip": false,
816
+ "single_word": false,
817
+ "special": true
818
+ },
819
+ "128102": {
820
+ "content": "<|reserved_special_token_94|>",
821
+ "lstrip": false,
822
+ "normalized": false,
823
+ "rstrip": false,
824
+ "single_word": false,
825
+ "special": true
826
+ },
827
+ "128103": {
828
+ "content": "<|reserved_special_token_95|>",
829
+ "lstrip": false,
830
+ "normalized": false,
831
+ "rstrip": false,
832
+ "single_word": false,
833
+ "special": true
834
+ },
835
+ "128104": {
836
+ "content": "<|reserved_special_token_96|>",
837
+ "lstrip": false,
838
+ "normalized": false,
839
+ "rstrip": false,
840
+ "single_word": false,
841
+ "special": true
842
+ },
843
+ "128105": {
844
+ "content": "<|reserved_special_token_97|>",
845
+ "lstrip": false,
846
+ "normalized": false,
847
+ "rstrip": false,
848
+ "single_word": false,
849
+ "special": true
850
+ },
851
+ "128106": {
852
+ "content": "<|reserved_special_token_98|>",
853
+ "lstrip": false,
854
+ "normalized": false,
855
+ "rstrip": false,
856
+ "single_word": false,
857
+ "special": true
858
+ },
859
+ "128107": {
860
+ "content": "<|reserved_special_token_99|>",
861
+ "lstrip": false,
862
+ "normalized": false,
863
+ "rstrip": false,
864
+ "single_word": false,
865
+ "special": true
866
+ },
867
+ "128108": {
868
+ "content": "<|reserved_special_token_100|>",
869
+ "lstrip": false,
870
+ "normalized": false,
871
+ "rstrip": false,
872
+ "single_word": false,
873
+ "special": true
874
+ },
875
+ "128109": {
876
+ "content": "<|reserved_special_token_101|>",
877
+ "lstrip": false,
878
+ "normalized": false,
879
+ "rstrip": false,
880
+ "single_word": false,
881
+ "special": true
882
+ },
883
+ "128110": {
884
+ "content": "<|reserved_special_token_102|>",
885
+ "lstrip": false,
886
+ "normalized": false,
887
+ "rstrip": false,
888
+ "single_word": false,
889
+ "special": true
890
+ },
891
+ "128111": {
892
+ "content": "<|reserved_special_token_103|>",
893
+ "lstrip": false,
894
+ "normalized": false,
895
+ "rstrip": false,
896
+ "single_word": false,
897
+ "special": true
898
+ },
899
+ "128112": {
900
+ "content": "<|reserved_special_token_104|>",
901
+ "lstrip": false,
902
+ "normalized": false,
903
+ "rstrip": false,
904
+ "single_word": false,
905
+ "special": true
906
+ },
907
+ "128113": {
908
+ "content": "<|reserved_special_token_105|>",
909
+ "lstrip": false,
910
+ "normalized": false,
911
+ "rstrip": false,
912
+ "single_word": false,
913
+ "special": true
914
+ },
915
+ "128114": {
916
+ "content": "<|reserved_special_token_106|>",
917
+ "lstrip": false,
918
+ "normalized": false,
919
+ "rstrip": false,
920
+ "single_word": false,
921
+ "special": true
922
+ },
923
+ "128115": {
924
+ "content": "<|reserved_special_token_107|>",
925
+ "lstrip": false,
926
+ "normalized": false,
927
+ "rstrip": false,
928
+ "single_word": false,
929
+ "special": true
930
+ },
931
+ "128116": {
932
+ "content": "<|reserved_special_token_108|>",
933
+ "lstrip": false,
934
+ "normalized": false,
935
+ "rstrip": false,
936
+ "single_word": false,
937
+ "special": true
938
+ },
939
+ "128117": {
940
+ "content": "<|reserved_special_token_109|>",
941
+ "lstrip": false,
942
+ "normalized": false,
943
+ "rstrip": false,
944
+ "single_word": false,
945
+ "special": true
946
+ },
947
+ "128118": {
948
+ "content": "<|reserved_special_token_110|>",
949
+ "lstrip": false,
950
+ "normalized": false,
951
+ "rstrip": false,
952
+ "single_word": false,
953
+ "special": true
954
+ },
955
+ "128119": {
956
+ "content": "<|reserved_special_token_111|>",
957
+ "lstrip": false,
958
+ "normalized": false,
959
+ "rstrip": false,
960
+ "single_word": false,
961
+ "special": true
962
+ },
963
+ "128120": {
964
+ "content": "<|reserved_special_token_112|>",
965
+ "lstrip": false,
966
+ "normalized": false,
967
+ "rstrip": false,
968
+ "single_word": false,
969
+ "special": true
970
+ },
971
+ "128121": {
972
+ "content": "<|reserved_special_token_113|>",
973
+ "lstrip": false,
974
+ "normalized": false,
975
+ "rstrip": false,
976
+ "single_word": false,
977
+ "special": true
978
+ },
979
+ "128122": {
980
+ "content": "<|reserved_special_token_114|>",
981
+ "lstrip": false,
982
+ "normalized": false,
983
+ "rstrip": false,
984
+ "single_word": false,
985
+ "special": true
986
+ },
987
+ "128123": {
988
+ "content": "<|reserved_special_token_115|>",
989
+ "lstrip": false,
990
+ "normalized": false,
991
+ "rstrip": false,
992
+ "single_word": false,
993
+ "special": true
994
+ },
995
+ "128124": {
996
+ "content": "<|reserved_special_token_116|>",
997
+ "lstrip": false,
998
+ "normalized": false,
999
+ "rstrip": false,
1000
+ "single_word": false,
1001
+ "special": true
1002
+ },
1003
+ "128125": {
1004
+ "content": "<|reserved_special_token_117|>",
1005
+ "lstrip": false,
1006
+ "normalized": false,
1007
+ "rstrip": false,
1008
+ "single_word": false,
1009
+ "special": true
1010
+ },
1011
+ "128126": {
1012
+ "content": "<|reserved_special_token_118|>",
1013
+ "lstrip": false,
1014
+ "normalized": false,
1015
+ "rstrip": false,
1016
+ "single_word": false,
1017
+ "special": true
1018
+ },
1019
+ "128127": {
1020
+ "content": "<|reserved_special_token_119|>",
1021
+ "lstrip": false,
1022
+ "normalized": false,
1023
+ "rstrip": false,
1024
+ "single_word": false,
1025
+ "special": true
1026
+ },
1027
+ "128128": {
1028
+ "content": "<|reserved_special_token_120|>",
1029
+ "lstrip": false,
1030
+ "normalized": false,
1031
+ "rstrip": false,
1032
+ "single_word": false,
1033
+ "special": true
1034
+ },
1035
+ "128129": {
1036
+ "content": "<|reserved_special_token_121|>",
1037
+ "lstrip": false,
1038
+ "normalized": false,
1039
+ "rstrip": false,
1040
+ "single_word": false,
1041
+ "special": true
1042
+ },
1043
+ "128130": {
1044
+ "content": "<|reserved_special_token_122|>",
1045
+ "lstrip": false,
1046
+ "normalized": false,
1047
+ "rstrip": false,
1048
+ "single_word": false,
1049
+ "special": true
1050
+ },
1051
+ "128131": {
1052
+ "content": "<|reserved_special_token_123|>",
1053
+ "lstrip": false,
1054
+ "normalized": false,
1055
+ "rstrip": false,
1056
+ "single_word": false,
1057
+ "special": true
1058
+ },
1059
+ "128132": {
1060
+ "content": "<|reserved_special_token_124|>",
1061
+ "lstrip": false,
1062
+ "normalized": false,
1063
+ "rstrip": false,
1064
+ "single_word": false,
1065
+ "special": true
1066
+ },
1067
+ "128133": {
1068
+ "content": "<|reserved_special_token_125|>",
1069
+ "lstrip": false,
1070
+ "normalized": false,
1071
+ "rstrip": false,
1072
+ "single_word": false,
1073
+ "special": true
1074
+ },
1075
+ "128134": {
1076
+ "content": "<|reserved_special_token_126|>",
1077
+ "lstrip": false,
1078
+ "normalized": false,
1079
+ "rstrip": false,
1080
+ "single_word": false,
1081
+ "special": true
1082
+ },
1083
+ "128135": {
1084
+ "content": "<|reserved_special_token_127|>",
1085
+ "lstrip": false,
1086
+ "normalized": false,
1087
+ "rstrip": false,
1088
+ "single_word": false,
1089
+ "special": true
1090
+ },
1091
+ "128136": {
1092
+ "content": "<|reserved_special_token_128|>",
1093
+ "lstrip": false,
1094
+ "normalized": false,
1095
+ "rstrip": false,
1096
+ "single_word": false,
1097
+ "special": true
1098
+ },
1099
+ "128137": {
1100
+ "content": "<|reserved_special_token_129|>",
1101
+ "lstrip": false,
1102
+ "normalized": false,
1103
+ "rstrip": false,
1104
+ "single_word": false,
1105
+ "special": true
1106
+ },
1107
+ "128138": {
1108
+ "content": "<|reserved_special_token_130|>",
1109
+ "lstrip": false,
1110
+ "normalized": false,
1111
+ "rstrip": false,
1112
+ "single_word": false,
1113
+ "special": true
1114
+ },
1115
+ "128139": {
1116
+ "content": "<|reserved_special_token_131|>",
1117
+ "lstrip": false,
1118
+ "normalized": false,
1119
+ "rstrip": false,
1120
+ "single_word": false,
1121
+ "special": true
1122
+ },
1123
+ "128140": {
1124
+ "content": "<|reserved_special_token_132|>",
1125
+ "lstrip": false,
1126
+ "normalized": false,
1127
+ "rstrip": false,
1128
+ "single_word": false,
1129
+ "special": true
1130
+ },
1131
+ "128141": {
1132
+ "content": "<|reserved_special_token_133|>",
1133
+ "lstrip": false,
1134
+ "normalized": false,
1135
+ "rstrip": false,
1136
+ "single_word": false,
1137
+ "special": true
1138
+ },
1139
+ "128142": {
1140
+ "content": "<|reserved_special_token_134|>",
1141
+ "lstrip": false,
1142
+ "normalized": false,
1143
+ "rstrip": false,
1144
+ "single_word": false,
1145
+ "special": true
1146
+ },
1147
+ "128143": {
1148
+ "content": "<|reserved_special_token_135|>",
1149
+ "lstrip": false,
1150
+ "normalized": false,
1151
+ "rstrip": false,
1152
+ "single_word": false,
1153
+ "special": true
1154
+ },
1155
+ "128144": {
1156
+ "content": "<|reserved_special_token_136|>",
1157
+ "lstrip": false,
1158
+ "normalized": false,
1159
+ "rstrip": false,
1160
+ "single_word": false,
1161
+ "special": true
1162
+ },
1163
+ "128145": {
1164
+ "content": "<|reserved_special_token_137|>",
1165
+ "lstrip": false,
1166
+ "normalized": false,
1167
+ "rstrip": false,
1168
+ "single_word": false,
1169
+ "special": true
1170
+ },
1171
+ "128146": {
1172
+ "content": "<|reserved_special_token_138|>",
1173
+ "lstrip": false,
1174
+ "normalized": false,
1175
+ "rstrip": false,
1176
+ "single_word": false,
1177
+ "special": true
1178
+ },
1179
+ "128147": {
1180
+ "content": "<|reserved_special_token_139|>",
1181
+ "lstrip": false,
1182
+ "normalized": false,
1183
+ "rstrip": false,
1184
+ "single_word": false,
1185
+ "special": true
1186
+ },
1187
+ "128148": {
1188
+ "content": "<|reserved_special_token_140|>",
1189
+ "lstrip": false,
1190
+ "normalized": false,
1191
+ "rstrip": false,
1192
+ "single_word": false,
1193
+ "special": true
1194
+ },
1195
+ "128149": {
1196
+ "content": "<|reserved_special_token_141|>",
1197
+ "lstrip": false,
1198
+ "normalized": false,
1199
+ "rstrip": false,
1200
+ "single_word": false,
1201
+ "special": true
1202
+ },
1203
+ "128150": {
1204
+ "content": "<|reserved_special_token_142|>",
1205
+ "lstrip": false,
1206
+ "normalized": false,
1207
+ "rstrip": false,
1208
+ "single_word": false,
1209
+ "special": true
1210
+ },
1211
+ "128151": {
1212
+ "content": "<|reserved_special_token_143|>",
1213
+ "lstrip": false,
1214
+ "normalized": false,
1215
+ "rstrip": false,
1216
+ "single_word": false,
1217
+ "special": true
1218
+ },
1219
+ "128152": {
1220
+ "content": "<|reserved_special_token_144|>",
1221
+ "lstrip": false,
1222
+ "normalized": false,
1223
+ "rstrip": false,
1224
+ "single_word": false,
1225
+ "special": true
1226
+ },
1227
+ "128153": {
1228
+ "content": "<|reserved_special_token_145|>",
1229
+ "lstrip": false,
1230
+ "normalized": false,
1231
+ "rstrip": false,
1232
+ "single_word": false,
1233
+ "special": true
1234
+ },
1235
+ "128154": {
1236
+ "content": "<|reserved_special_token_146|>",
1237
+ "lstrip": false,
1238
+ "normalized": false,
1239
+ "rstrip": false,
1240
+ "single_word": false,
1241
+ "special": true
1242
+ },
1243
+ "128155": {
1244
+ "content": "<|reserved_special_token_147|>",
1245
+ "lstrip": false,
1246
+ "normalized": false,
1247
+ "rstrip": false,
1248
+ "single_word": false,
1249
+ "special": true
1250
+ },
1251
+ "128156": {
1252
+ "content": "<|reserved_special_token_148|>",
1253
+ "lstrip": false,
1254
+ "normalized": false,
1255
+ "rstrip": false,
1256
+ "single_word": false,
1257
+ "special": true
1258
+ },
1259
+ "128157": {
1260
+ "content": "<|reserved_special_token_149|>",
1261
+ "lstrip": false,
1262
+ "normalized": false,
1263
+ "rstrip": false,
1264
+ "single_word": false,
1265
+ "special": true
1266
+ },
1267
+ "128158": {
1268
+ "content": "<|reserved_special_token_150|>",
1269
+ "lstrip": false,
1270
+ "normalized": false,
1271
+ "rstrip": false,
1272
+ "single_word": false,
1273
+ "special": true
1274
+ },
1275
+ "128159": {
1276
+ "content": "<|reserved_special_token_151|>",
1277
+ "lstrip": false,
1278
+ "normalized": false,
1279
+ "rstrip": false,
1280
+ "single_word": false,
1281
+ "special": true
1282
+ },
1283
+ "128160": {
1284
+ "content": "<|reserved_special_token_152|>",
1285
+ "lstrip": false,
1286
+ "normalized": false,
1287
+ "rstrip": false,
1288
+ "single_word": false,
1289
+ "special": true
1290
+ },
1291
+ "128161": {
1292
+ "content": "<|reserved_special_token_153|>",
1293
+ "lstrip": false,
1294
+ "normalized": false,
1295
+ "rstrip": false,
1296
+ "single_word": false,
1297
+ "special": true
1298
+ },
1299
+ "128162": {
1300
+ "content": "<|reserved_special_token_154|>",
1301
+ "lstrip": false,
1302
+ "normalized": false,
1303
+ "rstrip": false,
1304
+ "single_word": false,
1305
+ "special": true
1306
+ },
1307
+ "128163": {
1308
+ "content": "<|reserved_special_token_155|>",
1309
+ "lstrip": false,
1310
+ "normalized": false,
1311
+ "rstrip": false,
1312
+ "single_word": false,
1313
+ "special": true
1314
+ },
1315
+ "128164": {
1316
+ "content": "<|reserved_special_token_156|>",
1317
+ "lstrip": false,
1318
+ "normalized": false,
1319
+ "rstrip": false,
1320
+ "single_word": false,
1321
+ "special": true
1322
+ },
1323
+ "128165": {
1324
+ "content": "<|reserved_special_token_157|>",
1325
+ "lstrip": false,
1326
+ "normalized": false,
1327
+ "rstrip": false,
1328
+ "single_word": false,
1329
+ "special": true
1330
+ },
1331
+ "128166": {
1332
+ "content": "<|reserved_special_token_158|>",
1333
+ "lstrip": false,
1334
+ "normalized": false,
1335
+ "rstrip": false,
1336
+ "single_word": false,
1337
+ "special": true
1338
+ },
1339
+ "128167": {
1340
+ "content": "<|reserved_special_token_159|>",
1341
+ "lstrip": false,
1342
+ "normalized": false,
1343
+ "rstrip": false,
1344
+ "single_word": false,
1345
+ "special": true
1346
+ },
1347
+ "128168": {
1348
+ "content": "<|reserved_special_token_160|>",
1349
+ "lstrip": false,
1350
+ "normalized": false,
1351
+ "rstrip": false,
1352
+ "single_word": false,
1353
+ "special": true
1354
+ },
1355
+ "128169": {
1356
+ "content": "<|reserved_special_token_161|>",
1357
+ "lstrip": false,
1358
+ "normalized": false,
1359
+ "rstrip": false,
1360
+ "single_word": false,
1361
+ "special": true
1362
+ },
1363
+ "128170": {
1364
+ "content": "<|reserved_special_token_162|>",
1365
+ "lstrip": false,
1366
+ "normalized": false,
1367
+ "rstrip": false,
1368
+ "single_word": false,
1369
+ "special": true
1370
+ },
1371
+ "128171": {
1372
+ "content": "<|reserved_special_token_163|>",
1373
+ "lstrip": false,
1374
+ "normalized": false,
1375
+ "rstrip": false,
1376
+ "single_word": false,
1377
+ "special": true
1378
+ },
1379
+ "128172": {
1380
+ "content": "<|reserved_special_token_164|>",
1381
+ "lstrip": false,
1382
+ "normalized": false,
1383
+ "rstrip": false,
1384
+ "single_word": false,
1385
+ "special": true
1386
+ },
1387
+ "128173": {
1388
+ "content": "<|reserved_special_token_165|>",
1389
+ "lstrip": false,
1390
+ "normalized": false,
1391
+ "rstrip": false,
1392
+ "single_word": false,
1393
+ "special": true
1394
+ },
1395
+ "128174": {
1396
+ "content": "<|reserved_special_token_166|>",
1397
+ "lstrip": false,
1398
+ "normalized": false,
1399
+ "rstrip": false,
1400
+ "single_word": false,
1401
+ "special": true
1402
+ },
1403
+ "128175": {
1404
+ "content": "<|reserved_special_token_167|>",
1405
+ "lstrip": false,
1406
+ "normalized": false,
1407
+ "rstrip": false,
1408
+ "single_word": false,
1409
+ "special": true
1410
+ },
1411
+ "128176": {
1412
+ "content": "<|reserved_special_token_168|>",
1413
+ "lstrip": false,
1414
+ "normalized": false,
1415
+ "rstrip": false,
1416
+ "single_word": false,
1417
+ "special": true
1418
+ },
1419
+ "128177": {
1420
+ "content": "<|reserved_special_token_169|>",
1421
+ "lstrip": false,
1422
+ "normalized": false,
1423
+ "rstrip": false,
1424
+ "single_word": false,
1425
+ "special": true
1426
+ },
1427
+ "128178": {
1428
+ "content": "<|reserved_special_token_170|>",
1429
+ "lstrip": false,
1430
+ "normalized": false,
1431
+ "rstrip": false,
1432
+ "single_word": false,
1433
+ "special": true
1434
+ },
1435
+ "128179": {
1436
+ "content": "<|reserved_special_token_171|>",
1437
+ "lstrip": false,
1438
+ "normalized": false,
1439
+ "rstrip": false,
1440
+ "single_word": false,
1441
+ "special": true
1442
+ },
1443
+ "128180": {
1444
+ "content": "<|reserved_special_token_172|>",
1445
+ "lstrip": false,
1446
+ "normalized": false,
1447
+ "rstrip": false,
1448
+ "single_word": false,
1449
+ "special": true
1450
+ },
1451
+ "128181": {
1452
+ "content": "<|reserved_special_token_173|>",
1453
+ "lstrip": false,
1454
+ "normalized": false,
1455
+ "rstrip": false,
1456
+ "single_word": false,
1457
+ "special": true
1458
+ },
1459
+ "128182": {
1460
+ "content": "<|reserved_special_token_174|>",
1461
+ "lstrip": false,
1462
+ "normalized": false,
1463
+ "rstrip": false,
1464
+ "single_word": false,
1465
+ "special": true
1466
+ },
1467
+ "128183": {
1468
+ "content": "<|reserved_special_token_175|>",
1469
+ "lstrip": false,
1470
+ "normalized": false,
1471
+ "rstrip": false,
1472
+ "single_word": false,
1473
+ "special": true
1474
+ },
1475
+ "128184": {
1476
+ "content": "<|reserved_special_token_176|>",
1477
+ "lstrip": false,
1478
+ "normalized": false,
1479
+ "rstrip": false,
1480
+ "single_word": false,
1481
+ "special": true
1482
+ },
1483
+ "128185": {
1484
+ "content": "<|reserved_special_token_177|>",
1485
+ "lstrip": false,
1486
+ "normalized": false,
1487
+ "rstrip": false,
1488
+ "single_word": false,
1489
+ "special": true
1490
+ },
1491
+ "128186": {
1492
+ "content": "<|reserved_special_token_178|>",
1493
+ "lstrip": false,
1494
+ "normalized": false,
1495
+ "rstrip": false,
1496
+ "single_word": false,
1497
+ "special": true
1498
+ },
1499
+ "128187": {
1500
+ "content": "<|reserved_special_token_179|>",
1501
+ "lstrip": false,
1502
+ "normalized": false,
1503
+ "rstrip": false,
1504
+ "single_word": false,
1505
+ "special": true
1506
+ },
1507
+ "128188": {
1508
+ "content": "<|reserved_special_token_180|>",
1509
+ "lstrip": false,
1510
+ "normalized": false,
1511
+ "rstrip": false,
1512
+ "single_word": false,
1513
+ "special": true
1514
+ },
1515
+ "128189": {
1516
+ "content": "<|reserved_special_token_181|>",
1517
+ "lstrip": false,
1518
+ "normalized": false,
1519
+ "rstrip": false,
1520
+ "single_word": false,
1521
+ "special": true
1522
+ },
1523
+ "128190": {
1524
+ "content": "<|reserved_special_token_182|>",
1525
+ "lstrip": false,
1526
+ "normalized": false,
1527
+ "rstrip": false,
1528
+ "single_word": false,
1529
+ "special": true
1530
+ },
1531
+ "128191": {
1532
+ "content": "<|reserved_special_token_183|>",
1533
+ "lstrip": false,
1534
+ "normalized": false,
1535
+ "rstrip": false,
1536
+ "single_word": false,
1537
+ "special": true
1538
+ },
1539
+ "128192": {
1540
+ "content": "<|reserved_special_token_184|>",
1541
+ "lstrip": false,
1542
+ "normalized": false,
1543
+ "rstrip": false,
1544
+ "single_word": false,
1545
+ "special": true
1546
+ },
1547
+ "128193": {
1548
+ "content": "<|reserved_special_token_185|>",
1549
+ "lstrip": false,
1550
+ "normalized": false,
1551
+ "rstrip": false,
1552
+ "single_word": false,
1553
+ "special": true
1554
+ },
1555
+ "128194": {
1556
+ "content": "<|reserved_special_token_186|>",
1557
+ "lstrip": false,
1558
+ "normalized": false,
1559
+ "rstrip": false,
1560
+ "single_word": false,
1561
+ "special": true
1562
+ },
1563
+ "128195": {
1564
+ "content": "<|reserved_special_token_187|>",
1565
+ "lstrip": false,
1566
+ "normalized": false,
1567
+ "rstrip": false,
1568
+ "single_word": false,
1569
+ "special": true
1570
+ },
1571
+ "128196": {
1572
+ "content": "<|reserved_special_token_188|>",
1573
+ "lstrip": false,
1574
+ "normalized": false,
1575
+ "rstrip": false,
1576
+ "single_word": false,
1577
+ "special": true
1578
+ },
1579
+ "128197": {
1580
+ "content": "<|reserved_special_token_189|>",
1581
+ "lstrip": false,
1582
+ "normalized": false,
1583
+ "rstrip": false,
1584
+ "single_word": false,
1585
+ "special": true
1586
+ },
1587
+ "128198": {
1588
+ "content": "<|reserved_special_token_190|>",
1589
+ "lstrip": false,
1590
+ "normalized": false,
1591
+ "rstrip": false,
1592
+ "single_word": false,
1593
+ "special": true
1594
+ },
1595
+ "128199": {
1596
+ "content": "<|reserved_special_token_191|>",
1597
+ "lstrip": false,
1598
+ "normalized": false,
1599
+ "rstrip": false,
1600
+ "single_word": false,
1601
+ "special": true
1602
+ },
1603
+ "128200": {
1604
+ "content": "<|reserved_special_token_192|>",
1605
+ "lstrip": false,
1606
+ "normalized": false,
1607
+ "rstrip": false,
1608
+ "single_word": false,
1609
+ "special": true
1610
+ },
1611
+ "128201": {
1612
+ "content": "<|reserved_special_token_193|>",
1613
+ "lstrip": false,
1614
+ "normalized": false,
1615
+ "rstrip": false,
1616
+ "single_word": false,
1617
+ "special": true
1618
+ },
1619
+ "128202": {
1620
+ "content": "<|reserved_special_token_194|>",
1621
+ "lstrip": false,
1622
+ "normalized": false,
1623
+ "rstrip": false,
1624
+ "single_word": false,
1625
+ "special": true
1626
+ },
1627
+ "128203": {
1628
+ "content": "<|reserved_special_token_195|>",
1629
+ "lstrip": false,
1630
+ "normalized": false,
1631
+ "rstrip": false,
1632
+ "single_word": false,
1633
+ "special": true
1634
+ },
1635
+ "128204": {
1636
+ "content": "<|reserved_special_token_196|>",
1637
+ "lstrip": false,
1638
+ "normalized": false,
1639
+ "rstrip": false,
1640
+ "single_word": false,
1641
+ "special": true
1642
+ },
1643
+ "128205": {
1644
+ "content": "<|reserved_special_token_197|>",
1645
+ "lstrip": false,
1646
+ "normalized": false,
1647
+ "rstrip": false,
1648
+ "single_word": false,
1649
+ "special": true
1650
+ },
1651
+ "128206": {
1652
+ "content": "<|reserved_special_token_198|>",
1653
+ "lstrip": false,
1654
+ "normalized": false,
1655
+ "rstrip": false,
1656
+ "single_word": false,
1657
+ "special": true
1658
+ },
1659
+ "128207": {
1660
+ "content": "<|reserved_special_token_199|>",
1661
+ "lstrip": false,
1662
+ "normalized": false,
1663
+ "rstrip": false,
1664
+ "single_word": false,
1665
+ "special": true
1666
+ },
1667
+ "128208": {
1668
+ "content": "<|reserved_special_token_200|>",
1669
+ "lstrip": false,
1670
+ "normalized": false,
1671
+ "rstrip": false,
1672
+ "single_word": false,
1673
+ "special": true
1674
+ },
1675
+ "128209": {
1676
+ "content": "<|reserved_special_token_201|>",
1677
+ "lstrip": false,
1678
+ "normalized": false,
1679
+ "rstrip": false,
1680
+ "single_word": false,
1681
+ "special": true
1682
+ },
1683
+ "128210": {
1684
+ "content": "<|reserved_special_token_202|>",
1685
+ "lstrip": false,
1686
+ "normalized": false,
1687
+ "rstrip": false,
1688
+ "single_word": false,
1689
+ "special": true
1690
+ },
1691
+ "128211": {
1692
+ "content": "<|reserved_special_token_203|>",
1693
+ "lstrip": false,
1694
+ "normalized": false,
1695
+ "rstrip": false,
1696
+ "single_word": false,
1697
+ "special": true
1698
+ },
1699
+ "128212": {
1700
+ "content": "<|reserved_special_token_204|>",
1701
+ "lstrip": false,
1702
+ "normalized": false,
1703
+ "rstrip": false,
1704
+ "single_word": false,
1705
+ "special": true
1706
+ },
1707
+ "128213": {
1708
+ "content": "<|reserved_special_token_205|>",
1709
+ "lstrip": false,
1710
+ "normalized": false,
1711
+ "rstrip": false,
1712
+ "single_word": false,
1713
+ "special": true
1714
+ },
1715
+ "128214": {
1716
+ "content": "<|reserved_special_token_206|>",
1717
+ "lstrip": false,
1718
+ "normalized": false,
1719
+ "rstrip": false,
1720
+ "single_word": false,
1721
+ "special": true
1722
+ },
1723
+ "128215": {
1724
+ "content": "<|reserved_special_token_207|>",
1725
+ "lstrip": false,
1726
+ "normalized": false,
1727
+ "rstrip": false,
1728
+ "single_word": false,
1729
+ "special": true
1730
+ },
1731
+ "128216": {
1732
+ "content": "<|reserved_special_token_208|>",
1733
+ "lstrip": false,
1734
+ "normalized": false,
1735
+ "rstrip": false,
1736
+ "single_word": false,
1737
+ "special": true
1738
+ },
1739
+ "128217": {
1740
+ "content": "<|reserved_special_token_209|>",
1741
+ "lstrip": false,
1742
+ "normalized": false,
1743
+ "rstrip": false,
1744
+ "single_word": false,
1745
+ "special": true
1746
+ },
1747
+ "128218": {
1748
+ "content": "<|reserved_special_token_210|>",
1749
+ "lstrip": false,
1750
+ "normalized": false,
1751
+ "rstrip": false,
1752
+ "single_word": false,
1753
+ "special": true
1754
+ },
1755
+ "128219": {
1756
+ "content": "<|reserved_special_token_211|>",
1757
+ "lstrip": false,
1758
+ "normalized": false,
1759
+ "rstrip": false,
1760
+ "single_word": false,
1761
+ "special": true
1762
+ },
1763
+ "128220": {
1764
+ "content": "<|reserved_special_token_212|>",
1765
+ "lstrip": false,
1766
+ "normalized": false,
1767
+ "rstrip": false,
1768
+ "single_word": false,
1769
+ "special": true
1770
+ },
1771
+ "128221": {
1772
+ "content": "<|reserved_special_token_213|>",
1773
+ "lstrip": false,
1774
+ "normalized": false,
1775
+ "rstrip": false,
1776
+ "single_word": false,
1777
+ "special": true
1778
+ },
1779
+ "128222": {
1780
+ "content": "<|reserved_special_token_214|>",
1781
+ "lstrip": false,
1782
+ "normalized": false,
1783
+ "rstrip": false,
1784
+ "single_word": false,
1785
+ "special": true
1786
+ },
1787
+ "128223": {
1788
+ "content": "<|reserved_special_token_215|>",
1789
+ "lstrip": false,
1790
+ "normalized": false,
1791
+ "rstrip": false,
1792
+ "single_word": false,
1793
+ "special": true
1794
+ },
1795
+ "128224": {
1796
+ "content": "<|reserved_special_token_216|>",
1797
+ "lstrip": false,
1798
+ "normalized": false,
1799
+ "rstrip": false,
1800
+ "single_word": false,
1801
+ "special": true
1802
+ },
1803
+ "128225": {
1804
+ "content": "<|reserved_special_token_217|>",
1805
+ "lstrip": false,
1806
+ "normalized": false,
1807
+ "rstrip": false,
1808
+ "single_word": false,
1809
+ "special": true
1810
+ },
1811
+ "128226": {
1812
+ "content": "<|reserved_special_token_218|>",
1813
+ "lstrip": false,
1814
+ "normalized": false,
1815
+ "rstrip": false,
1816
+ "single_word": false,
1817
+ "special": true
1818
+ },
1819
+ "128227": {
1820
+ "content": "<|reserved_special_token_219|>",
1821
+ "lstrip": false,
1822
+ "normalized": false,
1823
+ "rstrip": false,
1824
+ "single_word": false,
1825
+ "special": true
1826
+ },
1827
+ "128228": {
1828
+ "content": "<|reserved_special_token_220|>",
1829
+ "lstrip": false,
1830
+ "normalized": false,
1831
+ "rstrip": false,
1832
+ "single_word": false,
1833
+ "special": true
1834
+ },
1835
+ "128229": {
1836
+ "content": "<|reserved_special_token_221|>",
1837
+ "lstrip": false,
1838
+ "normalized": false,
1839
+ "rstrip": false,
1840
+ "single_word": false,
1841
+ "special": true
1842
+ },
1843
+ "128230": {
1844
+ "content": "<|reserved_special_token_222|>",
1845
+ "lstrip": false,
1846
+ "normalized": false,
1847
+ "rstrip": false,
1848
+ "single_word": false,
1849
+ "special": true
1850
+ },
1851
+ "128231": {
1852
+ "content": "<|reserved_special_token_223|>",
1853
+ "lstrip": false,
1854
+ "normalized": false,
1855
+ "rstrip": false,
1856
+ "single_word": false,
1857
+ "special": true
1858
+ },
1859
+ "128232": {
1860
+ "content": "<|reserved_special_token_224|>",
1861
+ "lstrip": false,
1862
+ "normalized": false,
1863
+ "rstrip": false,
1864
+ "single_word": false,
1865
+ "special": true
1866
+ },
1867
+ "128233": {
1868
+ "content": "<|reserved_special_token_225|>",
1869
+ "lstrip": false,
1870
+ "normalized": false,
1871
+ "rstrip": false,
1872
+ "single_word": false,
1873
+ "special": true
1874
+ },
1875
+ "128234": {
1876
+ "content": "<|reserved_special_token_226|>",
1877
+ "lstrip": false,
1878
+ "normalized": false,
1879
+ "rstrip": false,
1880
+ "single_word": false,
1881
+ "special": true
1882
+ },
1883
+ "128235": {
1884
+ "content": "<|reserved_special_token_227|>",
1885
+ "lstrip": false,
1886
+ "normalized": false,
1887
+ "rstrip": false,
1888
+ "single_word": false,
1889
+ "special": true
1890
+ },
1891
+ "128236": {
1892
+ "content": "<|reserved_special_token_228|>",
1893
+ "lstrip": false,
1894
+ "normalized": false,
1895
+ "rstrip": false,
1896
+ "single_word": false,
1897
+ "special": true
1898
+ },
1899
+ "128237": {
1900
+ "content": "<|reserved_special_token_229|>",
1901
+ "lstrip": false,
1902
+ "normalized": false,
1903
+ "rstrip": false,
1904
+ "single_word": false,
1905
+ "special": true
1906
+ },
1907
+ "128238": {
1908
+ "content": "<|reserved_special_token_230|>",
1909
+ "lstrip": false,
1910
+ "normalized": false,
1911
+ "rstrip": false,
1912
+ "single_word": false,
1913
+ "special": true
1914
+ },
1915
+ "128239": {
1916
+ "content": "<|reserved_special_token_231|>",
1917
+ "lstrip": false,
1918
+ "normalized": false,
1919
+ "rstrip": false,
1920
+ "single_word": false,
1921
+ "special": true
1922
+ },
1923
+ "128240": {
1924
+ "content": "<|reserved_special_token_232|>",
1925
+ "lstrip": false,
1926
+ "normalized": false,
1927
+ "rstrip": false,
1928
+ "single_word": false,
1929
+ "special": true
1930
+ },
1931
+ "128241": {
1932
+ "content": "<|reserved_special_token_233|>",
1933
+ "lstrip": false,
1934
+ "normalized": false,
1935
+ "rstrip": false,
1936
+ "single_word": false,
1937
+ "special": true
1938
+ },
1939
+ "128242": {
1940
+ "content": "<|reserved_special_token_234|>",
1941
+ "lstrip": false,
1942
+ "normalized": false,
1943
+ "rstrip": false,
1944
+ "single_word": false,
1945
+ "special": true
1946
+ },
1947
+ "128243": {
1948
+ "content": "<|reserved_special_token_235|>",
1949
+ "lstrip": false,
1950
+ "normalized": false,
1951
+ "rstrip": false,
1952
+ "single_word": false,
1953
+ "special": true
1954
+ },
1955
+ "128244": {
1956
+ "content": "<|reserved_special_token_236|>",
1957
+ "lstrip": false,
1958
+ "normalized": false,
1959
+ "rstrip": false,
1960
+ "single_word": false,
1961
+ "special": true
1962
+ },
1963
+ "128245": {
1964
+ "content": "<|reserved_special_token_237|>",
1965
+ "lstrip": false,
1966
+ "normalized": false,
1967
+ "rstrip": false,
1968
+ "single_word": false,
1969
+ "special": true
1970
+ },
1971
+ "128246": {
1972
+ "content": "<|reserved_special_token_238|>",
1973
+ "lstrip": false,
1974
+ "normalized": false,
1975
+ "rstrip": false,
1976
+ "single_word": false,
1977
+ "special": true
1978
+ },
1979
+ "128247": {
1980
+ "content": "<|reserved_special_token_239|>",
1981
+ "lstrip": false,
1982
+ "normalized": false,
1983
+ "rstrip": false,
1984
+ "single_word": false,
1985
+ "special": true
1986
+ },
1987
+ "128248": {
1988
+ "content": "<|reserved_special_token_240|>",
1989
+ "lstrip": false,
1990
+ "normalized": false,
1991
+ "rstrip": false,
1992
+ "single_word": false,
1993
+ "special": true
1994
+ },
1995
+ "128249": {
1996
+ "content": "<|reserved_special_token_241|>",
1997
+ "lstrip": false,
1998
+ "normalized": false,
1999
+ "rstrip": false,
2000
+ "single_word": false,
2001
+ "special": true
2002
+ },
2003
+ "128250": {
2004
+ "content": "<|reserved_special_token_242|>",
2005
+ "lstrip": false,
2006
+ "normalized": false,
2007
+ "rstrip": false,
2008
+ "single_word": false,
2009
+ "special": true
2010
+ },
2011
+ "128251": {
2012
+ "content": "<|reserved_special_token_243|>",
2013
+ "lstrip": false,
2014
+ "normalized": false,
2015
+ "rstrip": false,
2016
+ "single_word": false,
2017
+ "special": true
2018
+ },
2019
+ "128252": {
2020
+ "content": "<|reserved_special_token_244|>",
2021
+ "lstrip": false,
2022
+ "normalized": false,
2023
+ "rstrip": false,
2024
+ "single_word": false,
2025
+ "special": true
2026
+ },
2027
+ "128253": {
2028
+ "content": "<|reserved_special_token_245|>",
2029
+ "lstrip": false,
2030
+ "normalized": false,
2031
+ "rstrip": false,
2032
+ "single_word": false,
2033
+ "special": true
2034
+ },
2035
+ "128254": {
2036
+ "content": "<|reserved_special_token_246|>",
2037
+ "lstrip": false,
2038
+ "normalized": false,
2039
+ "rstrip": false,
2040
+ "single_word": false,
2041
+ "special": true
2042
+ },
2043
+ "128255": {
2044
+ "content": "<|reserved_special_token_247|>",
2045
+ "lstrip": false,
2046
+ "normalized": false,
2047
+ "rstrip": false,
2048
+ "single_word": false,
2049
+ "special": true
2050
+ }
2051
+ },
2052
+ "bos_token": "<|begin_of_text|>",
2053
+ "clean_up_tokenization_spaces": true,
2054
+ "eos_token": "<|end_of_text|>",
2055
+ "extra_special_tokens": {},
2056
+ "mask_token": "<|mask|>",
2057
+ "max_length": null,
2058
+ "model_input_names": [
2059
+ "input_ids",
2060
+ "attention_mask"
2061
+ ],
2062
+ "model_max_length": 1000000000000000019884624838656,
2063
+ "pad_to_multiple_of": null,
2064
+ "pad_token": "<|pad|>",
2065
+ "pad_token_type_id": 0,
2066
+ "padding_side": "right",
2067
+ "tokenizer_class": "PreTrainedTokenizerFast"
2068
+ }