Omartificial-Intelligence-Space's picture
Improve model card: link to paper, set correct pipeline tag (#3)
e956be2 verified
---
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
datasets:
- Omartificial-Intelligence-Space/Arabic-NLi-Triplet
language:
- ar
library_name: sentence-transformers
license: apache-2.0
metrics:
- pearson_cosine
- spearman_cosine
- pearson_manhattan
- spearman_manhattan
- pearson_euclidean
- spearman_euclidean
- pearson_dot
- spearman_dot
- pearson_max
- spearman_max
pipeline_tag: feature-extraction
tags:
- mteb
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:557850
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط
النظيفة
sentences:
- رجل يقدم عرضاً
- هناك رجل بالخارج قرب الشاطئ
- رجل يجلس على أريكه
- source_sentence: رجل يقفز إلى سريره القذر
sentences:
- السرير قذر.
- رجل يضحك أثناء غسيل الملابس
- الرجل على القمر
- source_sentence: الفتيات بالخارج
sentences:
- امرأة تلف الخيط إلى كرات بجانب كومة من الكرات
- فتيان يركبان في جولة متعة
- ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط والثالثة تتحدث
إليهن
- source_sentence: الرجل يرتدي قميصاً أزرق.
sentences:
- رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة حمراء
مع الماء في الخلفية.
- كتاب القصص مفتوح
- رجل يرتدي قميص أسود يعزف على الجيتار.
- source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة
شابة.
sentences:
- ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه
- رجل يستلقي على وجهه على مقعد في الحديقة.
- الشاب نائم بينما الأم تقود ابنتها إلى الحديقة
model-index:
- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
results:
- task:
type: Retrieval
dataset:
name: MTEB MintakaRetrieval (ar)
type: mintaka/mmteb-mintaka
config: ar
split: test
revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
metrics:
- type: main_score
value: 12.493
- type: map_at_1
value: 5.719
- type: map_at_3
value: 8.269
- type: map_at_5
value: 9.172
- type: map_at_10
value: 9.894
- type: ndcg_at_1
value: 5.719
- type: ndcg_at_3
value: 9.128
- type: ndcg_at_5
value: 10.745
- type: ndcg_at_10
value: 12.493
- type: recall_at_1
value: 5.719
- type: recall_at_3
value: 11.621
- type: recall_at_5
value: 15.524
- type: recall_at_10
value: 20.926
- type: precision_at_1
value: 5.719
- type: precision_at_3
value: 3.874
- type: precision_at_5
value: 3.105
- type: precision_at_10
value: 2.093
- type: mrr_at_1
value: 5.7195
- type: mrr_at_3
value: 8.269
- type: mrr_at_5
value: 9.1723
- type: mrr_at_10
value: 9.8942
- task:
type: Retrieval
dataset:
name: MTEB MIRACLRetrievalHardNegatives (ar)
type: miracl/mmteb-miracl-hardnegatives
config: ar
split: dev
revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb
metrics:
- type: main_score
value: 22.396
- type: map_at_1
value: 8.866
- type: map_at_3
value: 13.905
- type: map_at_5
value: 15.326
- type: map_at_10
value: 16.851
- type: ndcg_at_1
value: 13.9
- type: ndcg_at_3
value: 17.309
- type: ndcg_at_5
value: 19.174
- type: ndcg_at_10
value: 22.396
- type: recall_at_1
value: 8.866
- type: recall_at_3
value: 19.177
- type: recall_at_5
value: 23.999
- type: recall_at_10
value: 32.421
- type: precision_at_1
value: 13.9
- type: precision_at_3
value: 10.933
- type: precision_at_5
value: 8.5
- type: precision_at_10
value: 5.96
- type: mrr_at_1
value: 13.9
- type: mrr_at_3
value: 20.0667
- type: mrr_at_5
value: 21.3617
- type: mrr_at_10
value: 22.7531
- task:
type: Retrieval
dataset:
name: MTEB MLQARetrieval (ar)
type: mlqa/mmteb-mlqa
config: ar
split: validation
revision: 397ed406c1a7902140303e7faf60fff35b58d285
metrics:
- type: main_score
value: 57.312
- type: map_at_1
value: 44.487
- type: map_at_3
value: 50.516
- type: map_at_5
value: 51.715
- type: map_at_10
value: 52.778
- type: ndcg_at_1
value: 44.487
- type: ndcg_at_3
value: 52.586
- type: ndcg_at_5
value: 54.742
- type: ndcg_at_10
value: 57.312
- type: recall_at_1
value: 44.487
- type: recall_at_3
value: 58.607
- type: recall_at_5
value: 63.83
- type: recall_at_10
value: 71.76
- type: precision_at_1
value: 44.487
- type: precision_at_3
value: 19.536
- type: precision_at_5
value: 12.766
- type: precision_at_10
value: 7.176
- type: mrr_at_1
value: 44.4874
- type: mrr_at_3
value: 50.5158
- type: mrr_at_5
value: 51.715
- type: mrr_at_10
value: 52.7782
- task:
type: Retrieval
dataset:
name: MTEB SadeemQuestionRetrieval (ar)
type: sadeem/mmteb-sadeem
config: default
split: test
revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9
metrics:
- type: main_score
value: 52.976
- type: map_at_1
value: 22.307
- type: map_at_3
value: 41.727
- type: map_at_5
value: 43.052
- type: map_at_10
value: 43.844
- type: ndcg_at_1
value: 22.307
- type: ndcg_at_3
value: 48.7
- type: ndcg_at_5
value: 51.057
- type: ndcg_at_10
value: 52.976
- type: recall_at_1
value: 22.307
- type: recall_at_3
value: 69.076
- type: recall_at_5
value: 74.725
- type: recall_at_10
value: 80.661
- type: precision_at_1
value: 22.307
- type: precision_at_3
value: 23.025
- type: precision_at_5
value: 14.945
- type: precision_at_10
value: 8.066
- type: mrr_at_1
value: 21.0148
- type: mrr_at_3
value: 40.8808
- type: mrr_at_5
value: 42.1254
- type: mrr_at_10
value: 42.9125
- task:
type: STS
dataset:
name: MTEB BIOSSES (default)
type: mteb/biosses-sts
config: default
split: test
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
metrics:
- type: cosine_pearson
value: 72.5081840952171
- type: cosine_spearman
value: 69.41362982941537
- type: euclidean_pearson
value: 67.45121490183709
- type: euclidean_spearman
value: 67.15273493989758
- type: main_score
value: 69.41362982941537
- type: manhattan_pearson
value: 67.6119022794479
- type: manhattan_spearman
value: 67.51659865246586
- task:
type: STS
dataset:
name: MTEB SICK-R (default)
type: mteb/sickr-sts
config: default
split: test
revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
metrics:
- type: cosine_pearson
value: 83.61591268324493
- type: cosine_spearman
value: 79.61914245705792
- type: euclidean_pearson
value: 81.32044881859483
- type: euclidean_spearman
value: 79.04866675279919
- type: main_score
value: 79.61914245705792
- type: manhattan_pearson
value: 81.09220518201322
- type: manhattan_spearman
value: 78.87590523907905
- task:
type: STS
dataset:
name: MTEB STS12 (default)
type: mteb/sts12-sts
config: default
split: test
revision: a0d554a64d88156834ff5ae9920b964011b16384
metrics:
- type: cosine_pearson
value: 84.59807803376341
- type: cosine_spearman
value: 77.38689922564416
- type: euclidean_pearson
value: 83.92034850646732
- type: euclidean_spearman
value: 76.75857193093438
- type: main_score
value: 77.38689922564416
- type: manhattan_pearson
value: 83.97191863964667
- type: manhattan_spearman
value: 76.89790070725708
- task:
type: STS
dataset:
name: MTEB STS13 (default)
type: mteb/sts13-sts
config: default
split: test
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
metrics:
- type: cosine_pearson
value: 78.18664268536664
- type: cosine_spearman
value: 79.58989311630421
- type: euclidean_pearson
value: 79.25259731614729
- type: euclidean_spearman
value: 80.1701122827397
- type: main_score
value: 79.58989311630421
- type: manhattan_pearson
value: 79.12601451996869
- type: manhattan_spearman
value: 79.98999436073663
- task:
type: STS
dataset:
name: MTEB STS14 (default)
type: mteb/sts14-sts
config: default
split: test
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
metrics:
- type: cosine_pearson
value: 80.97541876658141
- type: cosine_spearman
value: 79.78614320477877
- type: euclidean_pearson
value: 81.01514505747167
- type: euclidean_spearman
value: 80.73664735567839
- type: main_score
value: 79.78614320477877
- type: manhattan_pearson
value: 80.8746560526314
- type: manhattan_spearman
value: 80.67025673179079
- task:
type: STS
dataset:
name: MTEB STS15 (default)
type: mteb/sts15-sts
config: default
split: test
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
metrics:
- type: cosine_pearson
value: 85.23661155813113
- type: cosine_spearman
value: 86.21134464371615
- type: euclidean_pearson
value: 85.82518684522182
- type: euclidean_spearman
value: 86.43600784349509
- type: main_score
value: 86.21134464371615
- type: manhattan_pearson
value: 85.83101152371589
- type: manhattan_spearman
value: 86.42228695679498
- task:
type: STS
dataset:
name: MTEB STS16 (default)
type: mteb/sts16-sts
config: default
split: test
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
metrics:
- type: cosine_pearson
value: 79.20106689077852
- type: cosine_spearman
value: 81.39570893867825
- type: euclidean_pearson
value: 80.39578888768929
- type: euclidean_spearman
value: 81.19950443340412
- type: main_score
value: 81.39570893867825
- type: manhattan_pearson
value: 80.2226679341839
- type: manhattan_spearman
value: 80.99142422593823
- task:
type: STS
dataset:
name: MTEB STS17 (ar-ar)
type: mteb/sts17-crosslingual-sts
config: ar-ar
split: test
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
metrics:
- type: cosine_pearson
value: 81.05294851623468
- type: cosine_spearman
value: 81.10570655134113
- type: euclidean_pearson
value: 79.22292773537778
- type: euclidean_spearman
value: 78.84204232638425
- type: main_score
value: 81.10570655134113
- type: manhattan_pearson
value: 79.43750460320484
- type: manhattan_spearman
value: 79.33713593557482
- task:
type: STS
dataset:
name: MTEB STS22 (ar)
type: mteb/sts22-crosslingual-sts
config: ar
split: test
revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
metrics:
- type: cosine_pearson
value: 45.96875498680092
- type: cosine_spearman
value: 52.405509117149904
- type: euclidean_pearson
value: 42.097450896728226
- type: euclidean_spearman
value: 50.89022884113707
- type: main_score
value: 52.405509117149904
- type: manhattan_pearson
value: 42.22827727075534
- type: manhattan_spearman
value: 50.912841055442634
- task:
type: STS
dataset:
name: MTEB STSBenchmark (default)
type: mteb/stsbenchmark-sts
config: default
split: test
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
metrics:
- type: cosine_pearson
value: 83.13261516884116
- type: cosine_spearman
value: 84.3492527221498
- type: euclidean_pearson
value: 82.691603178401
- type: euclidean_spearman
value: 83.0499566200785
- type: main_score
value: 84.3492527221498
- type: manhattan_pearson
value: 82.68307441014618
- type: manhattan_spearman
value: 83.01315787964519
- task:
type: Summarization
dataset:
name: MTEB SummEval (default)
type: mteb/summeval
config: default
split: test
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
metrics:
- type: cosine_pearson
value: 31.149232235402845
- type: cosine_spearman
value: 30.685504130606255
- type: dot_pearson
value: 27.466307571160375
- type: dot_spearman
value: 28.93064261485915
- type: main_score
value: 30.685504130606255
- type: pearson
value: 31.149232235402845
- type: spearman
value: 30.685504130606255
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 256
type: sts-test-256
metrics:
- type: pearson_cosine
value: 0.8264447022356382
name: Pearson Cosine
- type: spearman_cosine
value: 0.8386403752382455
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8219134931449013
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.825509659109493
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8223094468630248
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8260503151751462
name: Spearman Euclidean
- type: pearson_dot
value: 0.6375226884845725
name: Pearson Dot
- type: spearman_dot
value: 0.6287228614640888
name: Spearman Dot
- type: pearson_max
value: 0.8264447022356382
name: Pearson Max
- type: spearman_max
value: 0.8386403752382455
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 128
type: sts-test-128
metrics:
- type: pearson_cosine
value: 0.8209661910768973
name: Pearson Cosine
- type: spearman_cosine
value: 0.8347149482673766
name: Spearman Cosine
- type: pearson_manhattan
value: 0.8082811559854036
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.8148314269262763
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.8093138512113149
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.8156468458613929
name: Spearman Euclidean
- type: pearson_dot
value: 0.5795109620454884
name: Pearson Dot
- type: spearman_dot
value: 0.5760223026552876
name: Spearman Dot
- type: pearson_max
value: 0.8209661910768973
name: Pearson Max
- type: spearman_max
value: 0.8347149482673766
name: Spearman Max
- task:
type: semantic-similarity
name: Semantic Similarity
dataset:
name: sts test 64
type: sts-test-64
metrics:
- type: pearson_cosine
value: 0.808708530451336
name: Pearson Cosine
- type: spearman_cosine
value: 0.8217532539767914
name: Spearman Cosine
- type: pearson_manhattan
value: 0.7876121380998453
name: Pearson Manhattan
- type: spearman_manhattan
value: 0.7969092304137347
name: Spearman Manhattan
- type: pearson_euclidean
value: 0.7902997966909958
name: Pearson Euclidean
- type: spearman_euclidean
value: 0.7987635968785215
name: Spearman Euclidean
- type: pearson_dot
value: 0.495047136234386
name: Pearson Dot
- type: spearman_dot
value: 0.49287000679901516
name: Spearman Dot
- type: pearson_max
value: 0.808708530451336
name: Pearson Max
- type: spearman_max
value: 0.8217532539767914
name: Spearman Max
---
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the [Arabic Matryoshka Embedding Models collection](https://huggingface.co/collections/Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e). It was presented in the paper [GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training](https://huggingface.co/papers/2505.24581).
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) <!-- at revision bf3bf13ab40c3157080a7ab344c831b9ad18b5eb -->
- **Maximum Sequence Length:** 128 tokens
- **Output Dimensionality:** 384 tokens
- **Similarity Function:** Cosine Similarity
- **Training Dataset:**
- Omartificial-Intelligence-Space/arabic-n_li-triplet
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging