|
--- |
|
base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
|
datasets: |
|
- Omartificial-Intelligence-Space/Arabic-NLi-Triplet |
|
language: |
|
- ar |
|
library_name: sentence-transformers |
|
license: apache-2.0 |
|
metrics: |
|
- pearson_cosine |
|
- spearman_cosine |
|
- pearson_manhattan |
|
- spearman_manhattan |
|
- pearson_euclidean |
|
- spearman_euclidean |
|
- pearson_dot |
|
- spearman_dot |
|
- pearson_max |
|
- spearman_max |
|
pipeline_tag: feature-extraction |
|
tags: |
|
- mteb |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:557850 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
widget: |
|
- source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط |
|
النظيفة |
|
sentences: |
|
- رجل يقدم عرضاً |
|
- هناك رجل بالخارج قرب الشاطئ |
|
- رجل يجلس على أريكه |
|
- source_sentence: رجل يقفز إلى سريره القذر |
|
sentences: |
|
- السرير قذر. |
|
- رجل يضحك أثناء غسيل الملابس |
|
- الرجل على القمر |
|
- source_sentence: الفتيات بالخارج |
|
sentences: |
|
- امرأة تلف الخيط إلى كرات بجانب كومة من الكرات |
|
- فتيان يركبان في جولة متعة |
|
- ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط والثالثة تتحدث |
|
إليهن |
|
- source_sentence: الرجل يرتدي قميصاً أزرق. |
|
sentences: |
|
- رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة حمراء |
|
مع الماء في الخلفية. |
|
- كتاب القصص مفتوح |
|
- رجل يرتدي قميص أسود يعزف على الجيتار. |
|
- source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة |
|
شابة. |
|
sentences: |
|
- ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه |
|
- رجل يستلقي على وجهه على مقعد في الحديقة. |
|
- الشاب نائم بينما الأم تقود ابنتها إلى الحديقة |
|
model-index: |
|
- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
|
results: |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
name: MTEB MintakaRetrieval (ar) |
|
type: mintaka/mmteb-mintaka |
|
config: ar |
|
split: test |
|
revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e |
|
metrics: |
|
- type: main_score |
|
value: 12.493 |
|
- type: map_at_1 |
|
value: 5.719 |
|
- type: map_at_3 |
|
value: 8.269 |
|
- type: map_at_5 |
|
value: 9.172 |
|
- type: map_at_10 |
|
value: 9.894 |
|
- type: ndcg_at_1 |
|
value: 5.719 |
|
- type: ndcg_at_3 |
|
value: 9.128 |
|
- type: ndcg_at_5 |
|
value: 10.745 |
|
- type: ndcg_at_10 |
|
value: 12.493 |
|
- type: recall_at_1 |
|
value: 5.719 |
|
- type: recall_at_3 |
|
value: 11.621 |
|
- type: recall_at_5 |
|
value: 15.524 |
|
- type: recall_at_10 |
|
value: 20.926 |
|
- type: precision_at_1 |
|
value: 5.719 |
|
- type: precision_at_3 |
|
value: 3.874 |
|
- type: precision_at_5 |
|
value: 3.105 |
|
- type: precision_at_10 |
|
value: 2.093 |
|
- type: mrr_at_1 |
|
value: 5.7195 |
|
- type: mrr_at_3 |
|
value: 8.269 |
|
- type: mrr_at_5 |
|
value: 9.1723 |
|
- type: mrr_at_10 |
|
value: 9.8942 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
name: MTEB MIRACLRetrievalHardNegatives (ar) |
|
type: miracl/mmteb-miracl-hardnegatives |
|
config: ar |
|
split: dev |
|
revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb |
|
metrics: |
|
- type: main_score |
|
value: 22.396 |
|
- type: map_at_1 |
|
value: 8.866 |
|
- type: map_at_3 |
|
value: 13.905 |
|
- type: map_at_5 |
|
value: 15.326 |
|
- type: map_at_10 |
|
value: 16.851 |
|
- type: ndcg_at_1 |
|
value: 13.9 |
|
- type: ndcg_at_3 |
|
value: 17.309 |
|
- type: ndcg_at_5 |
|
value: 19.174 |
|
- type: ndcg_at_10 |
|
value: 22.396 |
|
- type: recall_at_1 |
|
value: 8.866 |
|
- type: recall_at_3 |
|
value: 19.177 |
|
- type: recall_at_5 |
|
value: 23.999 |
|
- type: recall_at_10 |
|
value: 32.421 |
|
- type: precision_at_1 |
|
value: 13.9 |
|
- type: precision_at_3 |
|
value: 10.933 |
|
- type: precision_at_5 |
|
value: 8.5 |
|
- type: precision_at_10 |
|
value: 5.96 |
|
- type: mrr_at_1 |
|
value: 13.9 |
|
- type: mrr_at_3 |
|
value: 20.0667 |
|
- type: mrr_at_5 |
|
value: 21.3617 |
|
- type: mrr_at_10 |
|
value: 22.7531 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
name: MTEB MLQARetrieval (ar) |
|
type: mlqa/mmteb-mlqa |
|
config: ar |
|
split: validation |
|
revision: 397ed406c1a7902140303e7faf60fff35b58d285 |
|
metrics: |
|
- type: main_score |
|
value: 57.312 |
|
- type: map_at_1 |
|
value: 44.487 |
|
- type: map_at_3 |
|
value: 50.516 |
|
- type: map_at_5 |
|
value: 51.715 |
|
- type: map_at_10 |
|
value: 52.778 |
|
- type: ndcg_at_1 |
|
value: 44.487 |
|
- type: ndcg_at_3 |
|
value: 52.586 |
|
- type: ndcg_at_5 |
|
value: 54.742 |
|
- type: ndcg_at_10 |
|
value: 57.312 |
|
- type: recall_at_1 |
|
value: 44.487 |
|
- type: recall_at_3 |
|
value: 58.607 |
|
- type: recall_at_5 |
|
value: 63.83 |
|
- type: recall_at_10 |
|
value: 71.76 |
|
- type: precision_at_1 |
|
value: 44.487 |
|
- type: precision_at_3 |
|
value: 19.536 |
|
- type: precision_at_5 |
|
value: 12.766 |
|
- type: precision_at_10 |
|
value: 7.176 |
|
- type: mrr_at_1 |
|
value: 44.4874 |
|
- type: mrr_at_3 |
|
value: 50.5158 |
|
- type: mrr_at_5 |
|
value: 51.715 |
|
- type: mrr_at_10 |
|
value: 52.7782 |
|
- task: |
|
type: Retrieval |
|
dataset: |
|
name: MTEB SadeemQuestionRetrieval (ar) |
|
type: sadeem/mmteb-sadeem |
|
config: default |
|
split: test |
|
revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9 |
|
metrics: |
|
- type: main_score |
|
value: 52.976 |
|
- type: map_at_1 |
|
value: 22.307 |
|
- type: map_at_3 |
|
value: 41.727 |
|
- type: map_at_5 |
|
value: 43.052 |
|
- type: map_at_10 |
|
value: 43.844 |
|
- type: ndcg_at_1 |
|
value: 22.307 |
|
- type: ndcg_at_3 |
|
value: 48.7 |
|
- type: ndcg_at_5 |
|
value: 51.057 |
|
- type: ndcg_at_10 |
|
value: 52.976 |
|
- type: recall_at_1 |
|
value: 22.307 |
|
- type: recall_at_3 |
|
value: 69.076 |
|
- type: recall_at_5 |
|
value: 74.725 |
|
- type: recall_at_10 |
|
value: 80.661 |
|
- type: precision_at_1 |
|
value: 22.307 |
|
- type: precision_at_3 |
|
value: 23.025 |
|
- type: precision_at_5 |
|
value: 14.945 |
|
- type: precision_at_10 |
|
value: 8.066 |
|
- type: mrr_at_1 |
|
value: 21.0148 |
|
- type: mrr_at_3 |
|
value: 40.8808 |
|
- type: mrr_at_5 |
|
value: 42.1254 |
|
- type: mrr_at_10 |
|
value: 42.9125 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB BIOSSES (default) |
|
type: mteb/biosses-sts |
|
config: default |
|
split: test |
|
revision: d3fb88f8f02e40887cd149695127462bbcf29b4a |
|
metrics: |
|
- type: cosine_pearson |
|
value: 72.5081840952171 |
|
- type: cosine_spearman |
|
value: 69.41362982941537 |
|
- type: euclidean_pearson |
|
value: 67.45121490183709 |
|
- type: euclidean_spearman |
|
value: 67.15273493989758 |
|
- type: main_score |
|
value: 69.41362982941537 |
|
- type: manhattan_pearson |
|
value: 67.6119022794479 |
|
- type: manhattan_spearman |
|
value: 67.51659865246586 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB SICK-R (default) |
|
type: mteb/sickr-sts |
|
config: default |
|
split: test |
|
revision: 20a6d6f312dd54037fe07a32d58e5e168867909d |
|
metrics: |
|
- type: cosine_pearson |
|
value: 83.61591268324493 |
|
- type: cosine_spearman |
|
value: 79.61914245705792 |
|
- type: euclidean_pearson |
|
value: 81.32044881859483 |
|
- type: euclidean_spearman |
|
value: 79.04866675279919 |
|
- type: main_score |
|
value: 79.61914245705792 |
|
- type: manhattan_pearson |
|
value: 81.09220518201322 |
|
- type: manhattan_spearman |
|
value: 78.87590523907905 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS12 (default) |
|
type: mteb/sts12-sts |
|
config: default |
|
split: test |
|
revision: a0d554a64d88156834ff5ae9920b964011b16384 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 84.59807803376341 |
|
- type: cosine_spearman |
|
value: 77.38689922564416 |
|
- type: euclidean_pearson |
|
value: 83.92034850646732 |
|
- type: euclidean_spearman |
|
value: 76.75857193093438 |
|
- type: main_score |
|
value: 77.38689922564416 |
|
- type: manhattan_pearson |
|
value: 83.97191863964667 |
|
- type: manhattan_spearman |
|
value: 76.89790070725708 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS13 (default) |
|
type: mteb/sts13-sts |
|
config: default |
|
split: test |
|
revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca |
|
metrics: |
|
- type: cosine_pearson |
|
value: 78.18664268536664 |
|
- type: cosine_spearman |
|
value: 79.58989311630421 |
|
- type: euclidean_pearson |
|
value: 79.25259731614729 |
|
- type: euclidean_spearman |
|
value: 80.1701122827397 |
|
- type: main_score |
|
value: 79.58989311630421 |
|
- type: manhattan_pearson |
|
value: 79.12601451996869 |
|
- type: manhattan_spearman |
|
value: 79.98999436073663 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS14 (default) |
|
type: mteb/sts14-sts |
|
config: default |
|
split: test |
|
revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 80.97541876658141 |
|
- type: cosine_spearman |
|
value: 79.78614320477877 |
|
- type: euclidean_pearson |
|
value: 81.01514505747167 |
|
- type: euclidean_spearman |
|
value: 80.73664735567839 |
|
- type: main_score |
|
value: 79.78614320477877 |
|
- type: manhattan_pearson |
|
value: 80.8746560526314 |
|
- type: manhattan_spearman |
|
value: 80.67025673179079 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS15 (default) |
|
type: mteb/sts15-sts |
|
config: default |
|
split: test |
|
revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 85.23661155813113 |
|
- type: cosine_spearman |
|
value: 86.21134464371615 |
|
- type: euclidean_pearson |
|
value: 85.82518684522182 |
|
- type: euclidean_spearman |
|
value: 86.43600784349509 |
|
- type: main_score |
|
value: 86.21134464371615 |
|
- type: manhattan_pearson |
|
value: 85.83101152371589 |
|
- type: manhattan_spearman |
|
value: 86.42228695679498 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS16 (default) |
|
type: mteb/sts16-sts |
|
config: default |
|
split: test |
|
revision: 4d8694f8f0e0100860b497b999b3dbed754a0513 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 79.20106689077852 |
|
- type: cosine_spearman |
|
value: 81.39570893867825 |
|
- type: euclidean_pearson |
|
value: 80.39578888768929 |
|
- type: euclidean_spearman |
|
value: 81.19950443340412 |
|
- type: main_score |
|
value: 81.39570893867825 |
|
- type: manhattan_pearson |
|
value: 80.2226679341839 |
|
- type: manhattan_spearman |
|
value: 80.99142422593823 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS17 (ar-ar) |
|
type: mteb/sts17-crosslingual-sts |
|
config: ar-ar |
|
split: test |
|
revision: faeb762787bd10488a50c8b5be4a3b82e411949c |
|
metrics: |
|
- type: cosine_pearson |
|
value: 81.05294851623468 |
|
- type: cosine_spearman |
|
value: 81.10570655134113 |
|
- type: euclidean_pearson |
|
value: 79.22292773537778 |
|
- type: euclidean_spearman |
|
value: 78.84204232638425 |
|
- type: main_score |
|
value: 81.10570655134113 |
|
- type: manhattan_pearson |
|
value: 79.43750460320484 |
|
- type: manhattan_spearman |
|
value: 79.33713593557482 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STS22 (ar) |
|
type: mteb/sts22-crosslingual-sts |
|
config: ar |
|
split: test |
|
revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 45.96875498680092 |
|
- type: cosine_spearman |
|
value: 52.405509117149904 |
|
- type: euclidean_pearson |
|
value: 42.097450896728226 |
|
- type: euclidean_spearman |
|
value: 50.89022884113707 |
|
- type: main_score |
|
value: 52.405509117149904 |
|
- type: manhattan_pearson |
|
value: 42.22827727075534 |
|
- type: manhattan_spearman |
|
value: 50.912841055442634 |
|
- task: |
|
type: STS |
|
dataset: |
|
name: MTEB STSBenchmark (default) |
|
type: mteb/stsbenchmark-sts |
|
config: default |
|
split: test |
|
revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831 |
|
metrics: |
|
- type: cosine_pearson |
|
value: 83.13261516884116 |
|
- type: cosine_spearman |
|
value: 84.3492527221498 |
|
- type: euclidean_pearson |
|
value: 82.691603178401 |
|
- type: euclidean_spearman |
|
value: 83.0499566200785 |
|
- type: main_score |
|
value: 84.3492527221498 |
|
- type: manhattan_pearson |
|
value: 82.68307441014618 |
|
- type: manhattan_spearman |
|
value: 83.01315787964519 |
|
- task: |
|
type: Summarization |
|
dataset: |
|
name: MTEB SummEval (default) |
|
type: mteb/summeval |
|
config: default |
|
split: test |
|
revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c |
|
metrics: |
|
- type: cosine_pearson |
|
value: 31.149232235402845 |
|
- type: cosine_spearman |
|
value: 30.685504130606255 |
|
- type: dot_pearson |
|
value: 27.466307571160375 |
|
- type: dot_spearman |
|
value: 28.93064261485915 |
|
- type: main_score |
|
value: 30.685504130606255 |
|
- type: pearson |
|
value: 31.149232235402845 |
|
- type: spearman |
|
value: 30.685504130606255 |
|
- task: |
|
type: semantic-similarity |
|
name: Semantic Similarity |
|
dataset: |
|
name: sts test 256 |
|
type: sts-test-256 |
|
metrics: |
|
- type: pearson_cosine |
|
value: 0.8264447022356382 |
|
name: Pearson Cosine |
|
- type: spearman_cosine |
|
value: 0.8386403752382455 |
|
name: Spearman Cosine |
|
- type: pearson_manhattan |
|
value: 0.8219134931449013 |
|
name: Pearson Manhattan |
|
- type: spearman_manhattan |
|
value: 0.825509659109493 |
|
name: Spearman Manhattan |
|
- type: pearson_euclidean |
|
value: 0.8223094468630248 |
|
name: Pearson Euclidean |
|
- type: spearman_euclidean |
|
value: 0.8260503151751462 |
|
name: Spearman Euclidean |
|
- type: pearson_dot |
|
value: 0.6375226884845725 |
|
name: Pearson Dot |
|
- type: spearman_dot |
|
value: 0.6287228614640888 |
|
name: Spearman Dot |
|
- type: pearson_max |
|
value: 0.8264447022356382 |
|
name: Pearson Max |
|
- type: spearman_max |
|
value: 0.8386403752382455 |
|
name: Spearman Max |
|
- task: |
|
type: semantic-similarity |
|
name: Semantic Similarity |
|
dataset: |
|
name: sts test 128 |
|
type: sts-test-128 |
|
metrics: |
|
- type: pearson_cosine |
|
value: 0.8209661910768973 |
|
name: Pearson Cosine |
|
- type: spearman_cosine |
|
value: 0.8347149482673766 |
|
name: Spearman Cosine |
|
- type: pearson_manhattan |
|
value: 0.8082811559854036 |
|
name: Pearson Manhattan |
|
- type: spearman_manhattan |
|
value: 0.8148314269262763 |
|
name: Spearman Manhattan |
|
- type: pearson_euclidean |
|
value: 0.8093138512113149 |
|
name: Pearson Euclidean |
|
- type: spearman_euclidean |
|
value: 0.8156468458613929 |
|
name: Spearman Euclidean |
|
- type: pearson_dot |
|
value: 0.5795109620454884 |
|
name: Pearson Dot |
|
- type: spearman_dot |
|
value: 0.5760223026552876 |
|
name: Spearman Dot |
|
- type: pearson_max |
|
value: 0.8209661910768973 |
|
name: Pearson Max |
|
- type: spearman_max |
|
value: 0.8347149482673766 |
|
name: Spearman Max |
|
- task: |
|
type: semantic-similarity |
|
name: Semantic Similarity |
|
dataset: |
|
name: sts test 64 |
|
type: sts-test-64 |
|
metrics: |
|
- type: pearson_cosine |
|
value: 0.808708530451336 |
|
name: Pearson Cosine |
|
- type: spearman_cosine |
|
value: 0.8217532539767914 |
|
name: Spearman Cosine |
|
- type: pearson_manhattan |
|
value: 0.7876121380998453 |
|
name: Pearson Manhattan |
|
- type: spearman_manhattan |
|
value: 0.7969092304137347 |
|
name: Spearman Manhattan |
|
- type: pearson_euclidean |
|
value: 0.7902997966909958 |
|
name: Pearson Euclidean |
|
- type: spearman_euclidean |
|
value: 0.7987635968785215 |
|
name: Spearman Euclidean |
|
- type: pearson_dot |
|
value: 0.495047136234386 |
|
name: Pearson Dot |
|
- type: spearman_dot |
|
value: 0.49287000679901516 |
|
name: Spearman Dot |
|
- type: pearson_max |
|
value: 0.808708530451336 |
|
name: Pearson Max |
|
- type: spearman_max |
|
value: 0.8217532539767914 |
|
name: Spearman Max |
|
--- |
|
|
|
# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the [Arabic Matryoshka Embedding Models collection](https://huggingface.co/collections/Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e). It was presented in the paper [GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training](https://huggingface.co/papers/2505.24581). |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) <!-- at revision bf3bf13ab40c3157080a7ab344c831b9ad18b5eb --> |
|
- **Maximum Sequence Length:** 128 tokens |
|
- **Output Dimensionality:** 384 tokens |
|
- **Similarity Function:** Cosine Similarity |
|
- **Training Dataset:** |
|
- Omartificial-Intelligence-Space/arabic-n_li-triplet |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging |