---
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:64000
- loss:DenoisingAutoEncoderLoss
widget:
- source_sentence: đā¤ā¤¨đđĸđ đā¤đĒā¤đ ⤠đĢđŖā¤ĒđŖ đ⤍đ ⤠đđŖđąā¤ ā¤ŦđĸđĒđ ā¤đ¯
sentences:
- ' ⤪⤠ā¤ŦđĸđĒđ ⤠ā¤Ēā¤đĒđĻ đŖā¤ đ ā¤đĢā¤đĸđ˛đĸ⤪ā¤đĒđŗā¤ đŖā¤ ā¤ā¤đđĻđđŗā¤ ā¤ā¤ā¤Ŗā¤đĻ đā¤đ ā¤đĒ ā¤Ŗā¤đŖđŖā¤ đ ā¤đĢā¤đĸđ˛đĸđđŗā¤ ⤪⤠ā¤ĸā¤đĒ
đĸ⤪ā¤ā¤˛đĸđ¯'
- ' đŖā¤đā¤Ŧā¤đđĻ đŖā¤ đā¤ā¤¨đđĸđ đ đŖā¤Ēā¤đĒđĻ ā¤Ēā¤đ⤠đĸ⤪⤠đ¤ā¤đ ⤠ā¤ĸā¤ā¤ĸā¤ĸ⤠đđŖ đā¤đĒā¤đ ⤠đĸđŖā¤đ ā¤đ⤠đđąā¤ā¤Ēā¤đā¤Ē⤠đŖā¤
đ đŖā¤Ēā¤đĒ đŖā¤ā¤¨đā¤đĒ đĢđŖā¤ĒđŖ đŖā¤ đŗā¤¨ā¤đĻ đ⤍đ ⤠⤪⤠đ˛đĸ đ⤠đđŖđąā¤ ā¤ŦđĸđĒđ ā¤đ¯'
- ā¤Ēā¤đĒđĻđ đĸ ⤪⤠ā¤ĸ⤍ā¤Ŧ⤠đąā¤ ā¤ā¤¨đā¤Ŧđĸ⤪ā¤đĒ ā¤đąā¤ā¤˛ā¤˛đŖđ ā¤ā¤đ˛ā¤ ā¤Ē⤠ā¤ā¤ā¤˛đĸā¤ĸđĸđ ā¤ā¤đŗā¤đĒ đĸđĒā¤đ ⤠ā¤Ŧā¤đŗā¤đĒ ā¤Ē⤍đĒđđĸ⤪⤪ā¤
đ⤍đ ⤠⤪⤠⤤đĸ đąā¤ ā¤ā¤¨đā¤Ŧđĸ⤪ā¤đĒ đđąā¤ā¤˛ā¤˛ā¤ā¤ŖđĻ ā¤Ĩđ¯
- source_sentence: ⤪ā¤đ⤠ā¤Ŧā¤ā¤ĸ⤠đŖā¤ ⤞⤍đĒ⤠đŖā¤ đŖā¤ ā¤Ē⤠đ˛đĸ đŖā¤
sentences:
- đđŖđĢđ đ đĸ⤤đĢā¤đĻ⤞ đŖā¤ŦđĸđŖđĸ đā¤đ đĢā¤đĸđ˛đĻđŗđĢđĸ đĒā¤đā¤đĒ đ ā¤Ŧ⤠đąā¤ā¤Ēā¤đ đŖđĸđŗā¤đ ā¤ĸā¤đĻ đā¤Ĩđā¤ĨđŽđ¯
- ' đąā¤đđā¤đ ⤪ā¤đ⤠ā¤Ēā¤đĸđ ā¤đ⤠đąā¤ ā¤đąā¤đĒā¤đĒđĒ⤍đ đĢđĒ đŗā¤¨ ⤤đĸ ā¤Ŧā¤ā¤ĸ⤠đŖā¤ ⤞⤍đĒ⤠đŖā¤ đŖā¤¨đ ā¤ĸ⤍ā¤ā¤ā¤ā¤đĻđ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đđŗā¤¨
đŖā¤ đ ā¤đŗā¤¨ đđĻđ ⤠ā¤Ē⤠đĢā¤đ⤪ā¤đĒ đŖā¤ ā¤Ē⤠đ˛đĸ đŗā¤ā¤¨đĒđĸ đŖā¤ đŗā¤ā¤¨ā¤đĸ đ˛đĸ⤪đĻ đŖā¤ đŖā¤đ¯'
- ' ⤠đā¤đĒđā¤đŗđĢđĸđ đŖđŖđā¤đĒđĻ đ ā¤đā¤ā¤˛đĸđŗā¤đĒ ā¤˛ā¤ā¤¨ā¤ŖđŖā¤Ŗđĸđ đĸđđŖđĸ⤪⤠đĸā¤Ē⤠⤤đĻ ā¤ĸā¤ā¤ĸā¤ĸā¤đĒ đĢ⤍đ⤍đ ā¤đĒ đ⤍⤞⤠đŖā¤ đĢā¤đĒđđŖđđĸđ
đŗđĢā¤đĒđĸđ⤠⤠đĸđđŖđĸ⤪⤠đŖā¤ đ⤍đ ⤠ā¤Ēā¤ā¤ĸā¤ĸā¤ā¤Ēā¤đĒ đŖā¤ ā¤ĸđĸđ đŖđŖđ⤠đŖā¤ đđĸ⤪ā¤ā¤ŖđĻ đā¤đđĸđŖđŖđđĸđ đđąā¤đĒā¤đĒđĒ⤍ ā¤Ēā¤
đĢā¤đ⤪ā¤đĒ đđąā¤đĒā¤đĒđĒ⤍đ ⤞ā¤ā¤¨ā¤Ŗā¤ ⤠đā¤đŗā¤đĒđ¯'
- source_sentence: đŖā¤¨ā¤ĸ⤠ā¤ĸā¤ĸ⤤đ đ ā¤đ ā¤đĒ ā¤ā¤˛ā¤ā¤ĒđŖā¤¨đ đĸ
sentences:
- đŖā¤¨ā¤ĸ⤠đ⤍đ ⤠đŖđĻđđ⤎đŖđĻđđđ ā¤đā¤đ¤ā¤đĒā¤Ē⤠ā¤ĸā¤ĸ⤤đ đ ā¤đ ā¤đĒ đā¤đŗđŗđĻ⤪ ā¤ā¤˛ā¤ā¤ĒđŖā¤¨đ đĸ đ¯
- ' ā¤đ đ˛ā¤đĒ⤠đŗā¤đ ā¤đĒđąā¤ đ⤍đ ⤠đŖā¤ā¤Ŧ⤠ā¤ĸā¤ā¤Ŗā¤ ā¤đ đ˛ā¤đŖā¤đŖā¤ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đ ā¤Ŧ⤠đŗā¤ā¤¨đĒā¤đ đĸ⤪ā¤ā¤˛ā¤đĸ đ⤠đā¤đđĻđĒđĸ⤪ā¤
đ ā¤đŗā¤¨ ⤪ā¤đĒā¤đ¯'
- ' đĢā¤đĨā¤đ⤠ā¤đ¤ā¤ā¤ĸā¤Ēā¤đĒđąā¤ ⤪ā¤đ⤠đŖā¤ đąā¤đĢā¤ā¤˛ā¤ đ ⤍đŗā¤đ đ ā¤đ ⤠⤤đĸđđĸđ ā¤ā¤Ŗā¤Ŗā¤¨đā¤đ ⤪ā¤ā¤đĸ đŖā¤ ā¤Ēā¤đąā¤ā¤Ŧā¤đĒđ¯'
- source_sentence: ā¤đ
sentences:
- đ ⤍ā¤Ē⤍đąā¤ ⤠đĒā¤đā¤đĒ ā¤° ā¤Ŧ⤠đąā¤ā¤Ēā¤đ đ ā¤ā¤Ŗā¤¨đ ⤠đ§đ§ā¤ đĻ ā¤đ⤍ đ⤠⤤đĸđđĸđ đ˛ā¤đŗđĸđđđŖđđĸ đŦđ§ đŖā¤ đđĻ ā¤¤đĸđđĸđ đąā¤đđĸ
đđĸđĒā¤Ŧđĸđ đŖā¤ ⤪⤠⤪đĸ đĢā¤ā¤Ēđŗā¤đĒđĸđ đ đĸđā¤Ē⤍đ⤠đā¤ā¤ā¤đ ā¤ĸā¤ā¤Ŗā¤đ ā¤Ēā¤đŗđĢđĸđđŗā¤ ⤠đā¤đđŖđ¯
- ' ā¤đ ⤪đĸ đĸđ ā¤đđĸđ đŗđ¯'
- ' đ˛ā¤đĢā¤đŖ ⤪⤠đā¤đ đ ā¤ā¤˛ā¤ đā¤đā¤đĒ ā¤ đ§đ⤠ā¤đđ° đŖā¤ đđąā¤ā¤˛ā¤˛ā¤ā¤ŖđĻ đđ§ đ ā¤đŗā¤¨ ā¤ĸā¤đ đŗđĢā¤đā¤đąā¤ ⤠đąā¤đŗā¤đđđĸ ⤠đĸ
⤠đŖā¤¨đ ā¤Ŧā¤đŗā¤đ¯'
- source_sentence: ā¤ŦđĢđŖđŗā¤Ē đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đ⤤đĸđĻ ā¤Ēā¤đĸđ ā¤đđŖđ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đ⤠⤪ā¤đ⤠ā¤ĸā¤
đ ā¤đ¤ā¤ā¤¨đ⤠⤤đĸđđĸđ đĢā¤đđā¤ā¤˛đĸ ⤪ā¤ā¤Ŗđĸđ
sentences:
- ā¤đ đĸđā¤Ēā¤ā¤¤ā¤¤đĸ⤪⤠⤠⤤đĸđđĸđ ā¤ŦđĢđŖđŗā¤Ē đŗđĻđĒđĸđĻđŗ đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đ⤤đĸđĻ ā¤Ēā¤đĒđĻ đŖā¤ ā¤đĸđ ā¤ĸđĸđ đĸđā¤Ŧā¤đā¤Ēā¤ā¤Ēā¤Ē⤍đ
ā¤Ēđŗā¤đĒđĸđ ā¤Ēā¤đĸđ ā¤đđŖđ đŖđĸđĒđĻā¤ĸ⤠đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đ⤠đ⤠ā¤đ đĸđ⤤đĸđĻ ā¤Ŗā¤đ⤠ā¤ĸ⤠đ ā¤đ¤ā¤ā¤¨đ⤠⤤đĸđđĸđ đđąā¤đ⤤đĸ⤪ā¤đĒ
đĢā¤đđā¤ā¤˛đĸ ⤪ā¤ā¤Ŗđĸđ ā¤Ēā¤đ˛đĸ⤪ā¤đĒđŗā¤¨đ¯
- ā¤ĒđŖā¤§đŗā¤Ŗ ⤧đĢđĸđĒđĸ đā¤đ đĢā¤đĸđ˛đĻ đŗđĢđĸ ⤠đĒā¤đā¤đĒ đđ ā¤Ŧ⤠đąā¤ā¤Ēā¤đ ā¤ā¤Ŧ⤍đŗā¤Ē⤠đā¤Ĩđđ§đŽ ā¤ā¤đ đąā¤đŗā¤đ ā¤ĸā¤đŖđ đĸđā¤ĒđŖđ
ā¤ā¤đ đ¤ā¤đ ā¤ĸđĸ⤠đđĻđ¯
- ā¤Ēā¤ā¤Ŧā¤Ŧā¤đ˛ā¤đŖđĸ đ ā¤ā¤Ēđŗā¤¨ā¤Ŧ⤍đđĸđ đ ⤍ā¤Ēā¤đđĻ đđĻ ā¤ đŗā¤đŗđĢđĻđ ā¤đĒ⤞đĸā¤Ē đŖā¤đđĻ ā¤Ŗā¤đđđĸđ ā¤ā¤Ŧā¤đŖđĻđ¤ ⤠ā¤đĒđĻđąā¤ ā¤Ē⤠ā¤Ēđŗā¤đđĸ⤪ā¤đĒ
đđĸđā¤đĒđ¯
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the đ¤ Hub
model = SentenceTransformer("T-Blue/tsdae_pro_MiniLM_L12_2")
# Run inference
sentences = [
'ā¤ŦđĢđŖđŗā¤Ē đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đ⤤đĸđĻ ā¤Ēā¤đĸđ ā¤đđŖđ đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đ⤠⤪ā¤đ⤠ā¤ĸ⤠đ ā¤đ¤ā¤ā¤¨đ⤠⤤đĸđđĸđ đĢā¤đđā¤ā¤˛đĸ ⤪ā¤ā¤Ŗđĸđ',
'ā¤đ đĸđā¤Ēā¤ā¤¤ā¤¤đĸ⤪⤠⤠⤤đĸđđĸđ ā¤ŦđĢđŖđŗā¤Ē đŗđĻđĒđĸđĻđŗ đĸđĸ đŗđĢđĸđđĻ đ ā¤đ˛đĸ đ ā¤đĢđĸđ đ ā¤đ⤤đĸđĻ ā¤Ēā¤đĒđĻ đŖā¤ ā¤đĸđ ā¤ĸđĸđ đĸđā¤Ŧā¤đā¤Ēā¤ā¤Ēā¤Ē⤍đ ā¤Ēđŗā¤đĒđĸđ ā¤Ēā¤đĸđ ā¤đđŖđ đŖđĸđĒđĻā¤ĸ⤠đŖā¤ đ˛ā¤đŗā¤ā¤˛ā¤¨ā¤˛ā¤˛ā¤¨đ⤠đ⤠ā¤đ đĸđ⤤đĸđĻ ā¤Ŗā¤đ⤠ā¤ĸ⤠đ ā¤đ¤ā¤ā¤¨đ⤠⤤đĸđđĸđ đđąā¤đ⤤đĸ⤪ā¤đĒ đĢā¤đđā¤ā¤˛đĸ ⤪ā¤ā¤Ŗđĸđ ā¤Ēā¤đ˛đĸ⤪ā¤đĒđŗā¤¨đ¯',
'ā¤ĒđŖā¤§đŗā¤Ŗ ⤧đĢđĸđĒđĸ đā¤đ đĢā¤đĸđ˛đĻ đŗđĢđĸ ⤠đĒā¤đā¤đĒ đđ ā¤Ŧ⤠đąā¤ā¤Ēā¤đ ā¤ā¤Ŧ⤍đŗā¤Ē⤠đā¤Ĩđđ§đŽ ā¤ā¤đ đąā¤đŗā¤đ ā¤ĸā¤đŖđ đĸđā¤ĒđŖđ ā¤ā¤đ đ¤ā¤đ ā¤ĸđĸ⤠đđĻđ¯',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 64,000 training samples
* Columns: sentence_0
and sentence_1
* Approximate statistics based on the first 1000 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
| type | string | string |
| details |
đ⤍đŖā¤¨ ā¤ĸđĸđĒđđĸđđĻđ⤍đŗā¤ ā¤ĒđĻđ⤍đ
| ā¤ĒđĻđ⤍đ ā¤Ēā¤ā¤Ŧ⤠⤪ā¤đ⤠đ⤍đŖā¤¨ đŖā¤ ā¤ĸđĸđĒđđĸđđĻđ⤍đŗā¤ đŖā¤ ā¤ĒđĻđ⤍đ ā¤Ēā¤ā¤¤đĢđŖā¤Ŧā¤đ¯
|
| ⤠⤤đĸā¤ĸđĸ⤪đŖā¤Ŗđĸđ đŗā¤đŖā¤đĒđąā¤đĒ đŗā¤¨ ā¤ā¤đĒ⤠đ ā¤ā¤Ēđŗā¤ā¤Ŗđĸđ
| ā¤ā¤ĸđŖđā¤đĸđā¤đ ā¤đĒ ā¤ ā¤Ŗā¤đąā¤đ⤤đĸđ ⤤đĸā¤ĸđĸ⤪đŖā¤Ŗđĸđ đŗā¤đŖā¤đĒđąā¤đĒ đā¤đ ā¤đā¤đĻ đ ā¤đŗā¤¨ ā¤đ đ˛ā¤đđĸ đ¤ā¤ đŗā¤¨ đĸ⤪⤠ā¤ā¤đĒ⤠đ ⤍ā¤Ēā¤đđĻ ā¤ đ ā¤ā¤Ēđŗā¤ā¤Ŗđĸđ ā¤ā¤ĸđŖđā¤đđŗā¤¨đ¯
|
| đŖā¤ ā¤Ŧ⤍đŖā¤¨đ đ ā¤đąā¤ đā¤đĒđĸđŖā¤¨đ đ ⤍đā¤ā¤˛ā¤˛ā¤¨ ā¤Ē⤠đ¯
| ā¤Ē⤠ā¤ĸ⤠đŖā¤ ā¤Ŧ⤍đŖā¤¨đ đ ā¤đąā¤ ā¤Ŧ⤠đā¤đĒđĸđŖā¤¨đ ā¤đā¤đĒ⤤đĢđĸđŗā¤Ē đŖā¤ā¤ĸā¤đ⤎đŖā¤ā¤ĸā¤đ đŖā¤ đ ⤍đā¤ā¤˛ā¤˛ā¤¨ đ ā¤đŗā¤¨ ā¤ā¤˛ā¤ā¤ā¤ đŖā¤ ā¤ā¤¨đā¤Ŧđĸ⤪ā¤đĒ đ ā¤đā¤đĸđā¤ā¤Ē⤠đ⤪ā¤đ⤤đĸ ā¤Ē⤠đā¤đ ⤍đŗ đ¯
|
* Loss: [DenoisingAutoEncoderLoss
](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
### Training Hyperparameters
#### Non-Default Hyperparameters
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters