--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:523982 - loss:MSELoss base_model: FacebookAI/xlm-roberta-base pipeline_tag: sentence-similarity library_name: sentence-transformers metrics: - negative_mse - pearson_cosine - spearman_cosine model-index: - name: SentenceTransformer based on FacebookAI/xlm-roberta-base results: - task: type: knowledge-distillation name: Knowledge Distillation dataset: name: mse en ua type: mse-en-ua metrics: - type: negative_mse value: -1.1089269071817398 name: Negative Mse - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts17 en en type: sts17-en-en metrics: - type: pearson_cosine value: 0.6784819487397877 name: Pearson Cosine - type: spearman_cosine value: 0.7308493185913256 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts17 en ua type: sts17-en-ua metrics: - type: pearson_cosine value: 0.592555339963418 name: Pearson Cosine - type: spearman_cosine value: 0.6197606373137193 name: Spearman Cosine - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts17 ua ua type: sts17-ua-ua metrics: - type: pearson_cosine value: 0.6158998595292998 name: Pearson Cosine - type: spearman_cosine value: 0.6445750755380512 name: Spearman Cosine license: mit datasets: - sentence-transformers/parallel-sentences-talks - sentence-transformers/parallel-sentences-tatoeba - sentence-transformers/parallel-sentences-wikimatrix language: - uk - en --- # SentenceTransformer based on FacebookAI/xlm-roberta-base This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. 👉 Check out the model on [GitHub](https://github.com/panalexeu/xlm-roberta-ua-distilled). ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 dimensions - **Similarity Function:** Cosine Similarity - **Training Dataset:** [parallel-sentences-talks](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [parallel-sentences-wikimatrix](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-wikimatrix), [parallel-sentences-tatoeba](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-tatoeba) - **Language:** Ukrainian, English - **License:** MIT ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("panalexeu/xlm-roberta-ua-distilled") # Run inference sentences = [ "You'd better consult the doctor.", 'Краще проконсультуйся у лікаря.', 'Їх позначають як Aufklärungsfahrzeug 93 та Aufklärungsfahrzeug 97 відповідно.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Knowledge Distillation * Dataset: `mse-en-ua` * Evaluated with [MSEEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.MSEEvaluator) | Metric | Value | |:-----------------|:------------| | **negative_mse** | **-1.1089** | #### Semantic Similarity * Datasets: `sts17-en-en`, `sts17-en-ua` and `sts17-ua-ua` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | sts17-en-en | sts17-en-ua | sts17-ua-ua | |:--------------------|:------------|:------------|:------------| | pearson_cosine | 0.6785 | 0.5926 | 0.6159 | | **spearman_cosine** | **0.7308** | **0.6198** | **0.6446** | ## Training Details ### Training Dataset * Dataset: [parallel-sentences-talks](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [parallel-sentences-wikimatrix](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-wikimatrix), [parallel-sentences-tatoeba](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-tatoeba) * Size: 523,982 training samples * Columns: english, non_english, and label * Approximate statistics based on the first 1000 samples: | | english | non_english | label | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------| | type | string | string | list | | details | | | | * Samples: | english | non_english | label | |:----------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| | Her real name is Lydia (リディア, Ridia), but she was mistaken for a boy and called Ricard. | Справжнє ім'я — Лідія, але її помилково сприйняли за хлопчика і назвали Рікард. | [0.15217968821525574, -0.17830222845077515, -0.12677159905433655, 0.22082313895225525, 0.40085524320602417, ...] | | (Applause) So he didn't just learn water. | (Аплодисменти) Він не тільки вивчив слово "вода". | [-0.1058148592710495, -0.08846072107553482, -0.2684604823589325, -0.105219267308712, 0.3050258755683899, ...] | | It is tightly integrated with SAM, the Storage and Archive Manager, and hence is often referred to as SAM-QFS. | Вона тісно інтегрована з SAM (Storage and Archive Manager), тому часто називається SAM-QFS. | [0.03270340710878372, -0.45798248052597046, -0.20090211927890778, 0.006579531356692314, -0.03178019821643829, ...] | * Loss: [MSELoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss) ### Evaluation Dataset * Dataset: [parallel-sentences-talks](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-talks), [parallel-sentences-wikimatrix](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-wikimatrix), [parallel-sentences-tatoeba](https://huggingface.co/datasets/sentence-transformers/parallel-sentences-tatoeba) * Size: 3,838 evaluation samples * Columns: english, non_english, and label * Approximate statistics based on the first 1000 samples: | | english | non_english | label | |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------| | type | string | string | list | | details | | | | * Samples: | english | non_english | label | |:---------------------------------------------------------|:-----------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------| | I have lost my wallet. | Я загубив гаманець. | [-0.11186987161636353, -0.03419225662946701, -0.31304317712783813, 0.0838347002863884, 0.108644500374794, ...] | | It's a pharmaceutical product. | Це фармацевтичний продукт. | [0.04133488982915878, -0.4182000756263733, -0.30786487460136414, -0.09351564198732376, -0.023946482688188553, ...] | | We've all heard of the Casual Friday thing. | Всі ми чули про «джинсову п’ятницю» (вільна форма одягу). | [-0.10697802156209946, 0.21002227067947388, -0.2513434886932373, -0.3718843460083008, 0.06871984899044037, ...] | * Loss: [MSELoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss) ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: steps - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 3 - `num_train_epochs`: 4 - `warmup_ratio`: 0.1 #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: steps - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 3 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 5e-05 - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 4 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: False - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `tp_size`: 0 - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: False - `resume_from_checkpoint`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: None - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | Validation Loss | mse-en-ua_negative_mse | sts17-en-en_spearman_cosine | sts17-en-ua_spearman_cosine | sts17-ua-ua_spearman_cosine | |:------:|:-----:|:-------------:|:---------------:|:----------------------:|:---------------------------:|:---------------------------:|:---------------------------:| | 0.0938 | 1024 | 0.3281 | 0.0297 | -2.9592 | 0.2325 | 0.1547 | 0.2265 | | 0.1876 | 2048 | 0.1136 | 0.2042 | -21.6693 | 0.0553 | 0.0429 | 0.2442 | | 0.2814 | 3072 | 0.1008 | 0.0273 | -2.7461 | 0.2666 | 0.0758 | 0.2613 | | 0.3752 | 4096 | 0.0843 | 0.0243 | -2.4623 | 0.2541 | 0.0012 | 0.3680 | | 0.4690 | 5120 | 0.0756 | 0.0216 | -2.2095 | 0.3933 | 0.2535 | 0.4342 | | 0.5628 | 6144 | 0.0661 | 0.0187 | -1.9539 | 0.5739 | 0.4222 | 0.5056 | | 0.6566 | 7168 | 0.0579 | 0.0164 | -1.7513 | 0.6184 | 0.4897 | 0.5826 | | 0.7504 | 8192 | 0.0526 | 0.0153 | -1.6546 | 0.6219 | 0.4568 | 0.5842 | | 0.8442 | 9216 | 0.0488 | 0.0142 | -1.5525 | 0.6160 | 0.5012 | 0.5884 | | 0.9380 | 10240 | 0.046 | 0.0135 | -1.4957 | 0.6361 | 0.5046 | 0.5969 | | 1.0318 | 11264 | 0.0437 | 0.0130 | -1.4506 | 0.6453 | 0.5093 | 0.5939 | | 1.1256 | 12288 | 0.0419 | 0.0125 | -1.4049 | 0.6403 | 0.5054 | 0.6020 | | 1.2194 | 13312 | 0.0404 | 0.0122 | -1.3794 | 0.6654 | 0.5442 | 0.6182 | | 1.3132 | 14336 | 0.0394 | 0.0118 | -1.3434 | 0.6800 | 0.5790 | 0.6291 | | 1.4070 | 15360 | 0.0383 | 0.0115 | -1.3184 | 0.6836 | 0.5805 | 0.6301 | | 1.5008 | 16384 | 0.0375 | 0.0114 | -1.3067 | 0.6742 | 0.5555 | 0.6055 | | 1.5946 | 17408 | 0.0368 | 0.0111 | -1.2864 | 0.6909 | 0.5765 | 0.6256 | | 1.6884 | 18432 | 0.036 | 0.0109 | -1.2633 | 0.6875 | 0.5801 | 0.6178 | | 1.7822 | 19456 | 0.0353 | 0.0107 | -1.2490 | 0.7060 | 0.5959 | 0.6322 | | 1.8760 | 20480 | 0.035 | 0.0106 | -1.2357 | 0.7127 | 0.6047 | 0.6389 | | 1.9698 | 21504 | 0.0344 | 0.0105 | -1.2265 | 0.7265 | 0.6233 | 0.6459 | | 2.0636 | 22528 | 0.0335 | 0.0103 | -1.2108 | 0.7184 | 0.6151 | 0.6438 | | 2.1574 | 23552 | 0.0327 | 0.0103 | -1.2101 | 0.7122 | 0.6074 | 0.6427 | | 2.2512 | 24576 | 0.0324 | 0.0102 | -1.1972 | 0.7232 | 0.6174 | 0.6447 | | 2.3450 | 25600 | 0.0322 | 0.0100 | -1.1813 | 0.7217 | 0.6166 | 0.6457 | | 2.4388 | 26624 | 0.032 | 0.0099 | -1.1745 | 0.7308 | 0.6272 | 0.6534 | | 2.5326 | 27648 | 0.0316 | 0.0098 | -1.1673 | 0.7289 | 0.6125 | 0.6441 | | 2.6264 | 28672 | 0.0314 | 0.0098 | -1.1622 | 0.7222 | 0.6105 | 0.6365 | | 2.7202 | 29696 | 0.0312 | 0.0097 | -1.1593 | 0.7175 | 0.6121 | 0.6348 | | 2.8140 | 30720 | 0.0308 | 0.0096 | -1.1457 | 0.7204 | 0.6044 | 0.6377 | | 2.9078 | 31744 | 0.0307 | 0.0095 | -1.1411 | 0.7230 | 0.6175 | 0.6353 | | 3.0016 | 32768 | 0.0305 | 0.0095 | -1.1414 | 0.7130 | 0.6052 | 0.6340 | | 3.0954 | 33792 | 0.0296 | 0.0095 | -1.1360 | 0.7234 | 0.6160 | 0.6411 | | 3.1892 | 34816 | 0.0295 | 0.0094 | -1.1317 | 0.7220 | 0.6131 | 0.6396 | | 3.2830 | 35840 | 0.0294 | 0.0094 | -1.1306 | 0.7315 | 0.6167 | 0.6505 | | 3.3768 | 36864 | 0.0293 | 0.0094 | -1.1263 | 0.7219 | 0.6089 | 0.6450 | | 3.4706 | 37888 | 0.0292 | 0.0093 | -1.1225 | 0.7236 | 0.6141 | 0.6451 | | 3.5644 | 38912 | 0.0291 | 0.0093 | -1.1204 | 0.7331 | 0.6179 | 0.6460 | | 3.6582 | 39936 | 0.029 | 0.0092 | -1.1147 | 0.7226 | 0.6127 | 0.6406 | | 3.7520 | 40960 | 0.029 | 0.0092 | -1.1118 | 0.7245 | 0.6184 | 0.6425 | | 3.8458 | 41984 | 0.0289 | 0.0092 | -1.1102 | 0.7279 | 0.6179 | 0.6465 | | 3.9396 | 43008 | 0.0288 | 0.0092 | -1.1099 | 0.7298 | 0.6191 | 0.6438 | | 3.9997 | 43664 | - | 0.0092 | -1.1089 | 0.7308 | 0.6198 | 0.6446 | ### Framework Versions - Python: 3.11.11 - Sentence Transformers: 3.4.1 - Transformers: 4.51.1 - PyTorch: 2.5.1+cu124 - Accelerate: 1.3.0 - Datasets: 3.5.0 - Tokenizers: 0.21.0 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MSELoss ```bibtex @inproceedings{reimers-2020-multilingual-sentence-bert, title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2020", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/2004.09813", } ```