Omartificial-Intelligence-Space's picture

Improve model card: link to paper, set correct pipeline tag (#3)

e956be2 verified 22 days ago

18.7 kB

	---
	base_model: sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
	datasets:
	- Omartificial-Intelligence-Space/Arabic-NLi-Triplet
	language:
	- ar
	library_name: sentence-transformers
	license: apache-2.0
	metrics:
	- pearson_cosine
	- spearman_cosine
	- pearson_manhattan
	- spearman_manhattan
	- pearson_euclidean
	- spearman_euclidean
	- pearson_dot
	- spearman_dot
	- pearson_max
	- spearman_max
	pipeline_tag: feature-extraction
	tags:
	- mteb
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:557850
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	widget:
	- source_sentence: ذكر متوازن بعناية يقف على قدم واحدة بالقرب من منطقة شاطئ المحيط
	النظيفة
	sentences:
	- رجل يقدم عرضاً
	- هناك رجل بالخارج قرب الشاطئ
	- رجل يجلس على أريكه
	- source_sentence: رجل يقفز إلى سريره القذر
	sentences:
	- السرير قذر.
	- رجل يضحك أثناء غسيل الملابس
	- الرجل على القمر
	- source_sentence: الفتيات بالخارج
	sentences:
	- امرأة تلف الخيط إلى كرات بجانب كومة من الكرات
	- فتيان يركبان في جولة متعة
	- ثلاث فتيات يقفون سوية في غرفة واحدة تستمع وواحدة تكتب على الحائط والثالثة تتحدث
	إليهن
	- source_sentence: الرجل يرتدي قميصاً أزرق.
	sentences:
	- رجل يرتدي قميصاً أزرق يميل إلى الجدار بجانب الطريق مع شاحنة زرقاء وسيارة حمراء
	مع الماء في الخلفية.
	- كتاب القصص مفتوح
	- رجل يرتدي قميص أسود يعزف على الجيتار.
	- source_sentence: يجلس شاب ذو شعر أشقر على الحائط يقرأ جريدة بينما تمر امرأة وفتاة
	شابة.
	sentences:
	- ذكر شاب ينظر إلى جريدة بينما تمر إمرأتان بجانبه
	- رجل يستلقي على وجهه على مقعد في الحديقة.
	- الشاب نائم بينما الأم تقود ابنتها إلى الحديقة
	model-index:
	- name: SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2
	results:
	- task:
	type: Retrieval
	dataset:
	name: MTEB MintakaRetrieval (ar)
	type: mintaka/mmteb-mintaka
	config: ar
	split: test
	revision: efa78cc2f74bbcd21eff2261f9e13aebe40b814e
	metrics:
	- type: main_score
	value: 12.493
	- type: map_at_1
	value: 5.719
	- type: map_at_3
	value: 8.269
	- type: map_at_5
	value: 9.172
	- type: map_at_10
	value: 9.894
	- type: ndcg_at_1
	value: 5.719
	- type: ndcg_at_3
	value: 9.128
	- type: ndcg_at_5
	value: 10.745
	- type: ndcg_at_10
	value: 12.493
	- type: recall_at_1
	value: 5.719
	- type: recall_at_3
	value: 11.621
	- type: recall_at_5
	value: 15.524
	- type: recall_at_10
	value: 20.926
	- type: precision_at_1
	value: 5.719
	- type: precision_at_3
	value: 3.874
	- type: precision_at_5
	value: 3.105
	- type: precision_at_10
	value: 2.093
	- type: mrr_at_1
	value: 5.7195
	- type: mrr_at_3
	value: 8.269
	- type: mrr_at_5
	value: 9.1723
	- type: mrr_at_10
	value: 9.8942
	- task:
	type: Retrieval
	dataset:
	name: MTEB MIRACLRetrievalHardNegatives (ar)
	type: miracl/mmteb-miracl-hardnegatives
	config: ar
	split: dev
	revision: 95c8db7d4a6e9c1d8a60601afd63d553ae20a2eb
	metrics:
	- type: main_score
	value: 22.396
	- type: map_at_1
	value: 8.866
	- type: map_at_3
	value: 13.905
	- type: map_at_5
	value: 15.326
	- type: map_at_10
	value: 16.851
	- type: ndcg_at_1
	value: 13.9
	- type: ndcg_at_3
	value: 17.309
	- type: ndcg_at_5
	value: 19.174
	- type: ndcg_at_10
	value: 22.396
	- type: recall_at_1
	value: 8.866
	- type: recall_at_3
	value: 19.177
	- type: recall_at_5
	value: 23.999
	- type: recall_at_10
	value: 32.421
	- type: precision_at_1
	value: 13.9
	- type: precision_at_3
	value: 10.933
	- type: precision_at_5
	value: 8.5
	- type: precision_at_10
	value: 5.96
	- type: mrr_at_1
	value: 13.9
	- type: mrr_at_3
	value: 20.0667
	- type: mrr_at_5
	value: 21.3617
	- type: mrr_at_10
	value: 22.7531
	- task:
	type: Retrieval
	dataset:
	name: MTEB MLQARetrieval (ar)
	type: mlqa/mmteb-mlqa
	config: ar
	split: validation
	revision: 397ed406c1a7902140303e7faf60fff35b58d285
	metrics:
	- type: main_score
	value: 57.312
	- type: map_at_1
	value: 44.487
	- type: map_at_3
	value: 50.516
	- type: map_at_5
	value: 51.715
	- type: map_at_10
	value: 52.778
	- type: ndcg_at_1
	value: 44.487
	- type: ndcg_at_3
	value: 52.586
	- type: ndcg_at_5
	value: 54.742
	- type: ndcg_at_10
	value: 57.312
	- type: recall_at_1
	value: 44.487
	- type: recall_at_3
	value: 58.607
	- type: recall_at_5
	value: 63.83
	- type: recall_at_10
	value: 71.76
	- type: precision_at_1
	value: 44.487
	- type: precision_at_3
	value: 19.536
	- type: precision_at_5
	value: 12.766
	- type: precision_at_10
	value: 7.176
	- type: mrr_at_1
	value: 44.4874
	- type: mrr_at_3
	value: 50.5158
	- type: mrr_at_5
	value: 51.715
	- type: mrr_at_10
	value: 52.7782
	- task:
	type: Retrieval
	dataset:
	name: MTEB SadeemQuestionRetrieval (ar)
	type: sadeem/mmteb-sadeem
	config: default
	split: test
	revision: 3cb0752b182e5d5d740df547748b06663c8e0bd9
	metrics:
	- type: main_score
	value: 52.976
	- type: map_at_1
	value: 22.307
	- type: map_at_3
	value: 41.727
	- type: map_at_5
	value: 43.052
	- type: map_at_10
	value: 43.844
	- type: ndcg_at_1
	value: 22.307
	- type: ndcg_at_3
	value: 48.7
	- type: ndcg_at_5
	value: 51.057
	- type: ndcg_at_10
	value: 52.976
	- type: recall_at_1
	value: 22.307
	- type: recall_at_3
	value: 69.076
	- type: recall_at_5
	value: 74.725
	- type: recall_at_10
	value: 80.661
	- type: precision_at_1
	value: 22.307
	- type: precision_at_3
	value: 23.025
	- type: precision_at_5
	value: 14.945
	- type: precision_at_10
	value: 8.066
	- type: mrr_at_1
	value: 21.0148
	- type: mrr_at_3
	value: 40.8808
	- type: mrr_at_5
	value: 42.1254
	- type: mrr_at_10
	value: 42.9125
	- task:
	type: STS
	dataset:
	name: MTEB BIOSSES (default)
	type: mteb/biosses-sts
	config: default
	split: test
	revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
	metrics:
	- type: cosine_pearson
	value: 72.5081840952171
	- type: cosine_spearman
	value: 69.41362982941537
	- type: euclidean_pearson
	value: 67.45121490183709
	- type: euclidean_spearman
	value: 67.15273493989758
	- type: main_score
	value: 69.41362982941537
	- type: manhattan_pearson
	value: 67.6119022794479
	- type: manhattan_spearman
	value: 67.51659865246586
	- task:
	type: STS
	dataset:
	name: MTEB SICK-R (default)
	type: mteb/sickr-sts
	config: default
	split: test
	revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
	metrics:
	- type: cosine_pearson
	value: 83.61591268324493
	- type: cosine_spearman
	value: 79.61914245705792
	- type: euclidean_pearson
	value: 81.32044881859483
	- type: euclidean_spearman
	value: 79.04866675279919
	- type: main_score
	value: 79.61914245705792
	- type: manhattan_pearson
	value: 81.09220518201322
	- type: manhattan_spearman
	value: 78.87590523907905
	- task:
	type: STS
	dataset:
	name: MTEB STS12 (default)
	type: mteb/sts12-sts
	config: default
	split: test
	revision: a0d554a64d88156834ff5ae9920b964011b16384
	metrics:
	- type: cosine_pearson
	value: 84.59807803376341
	- type: cosine_spearman
	value: 77.38689922564416
	- type: euclidean_pearson
	value: 83.92034850646732
	- type: euclidean_spearman
	value: 76.75857193093438
	- type: main_score
	value: 77.38689922564416
	- type: manhattan_pearson
	value: 83.97191863964667
	- type: manhattan_spearman
	value: 76.89790070725708
	- task:
	type: STS
	dataset:
	name: MTEB STS13 (default)
	type: mteb/sts13-sts
	config: default
	split: test
	revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
	metrics:
	- type: cosine_pearson
	value: 78.18664268536664
	- type: cosine_spearman
	value: 79.58989311630421
	- type: euclidean_pearson
	value: 79.25259731614729
	- type: euclidean_spearman
	value: 80.1701122827397
	- type: main_score
	value: 79.58989311630421
	- type: manhattan_pearson
	value: 79.12601451996869
	- type: manhattan_spearman
	value: 79.98999436073663
	- task:
	type: STS
	dataset:
	name: MTEB STS14 (default)
	type: mteb/sts14-sts
	config: default
	split: test
	revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
	metrics:
	- type: cosine_pearson
	value: 80.97541876658141
	- type: cosine_spearman
	value: 79.78614320477877
	- type: euclidean_pearson
	value: 81.01514505747167
	- type: euclidean_spearman
	value: 80.73664735567839
	- type: main_score
	value: 79.78614320477877
	- type: manhattan_pearson
	value: 80.8746560526314
	- type: manhattan_spearman
	value: 80.67025673179079
	- task:
	type: STS
	dataset:
	name: MTEB STS15 (default)
	type: mteb/sts15-sts
	config: default
	split: test
	revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
	metrics:
	- type: cosine_pearson
	value: 85.23661155813113
	- type: cosine_spearman
	value: 86.21134464371615
	- type: euclidean_pearson
	value: 85.82518684522182
	- type: euclidean_spearman
	value: 86.43600784349509
	- type: main_score
	value: 86.21134464371615
	- type: manhattan_pearson
	value: 85.83101152371589
	- type: manhattan_spearman
	value: 86.42228695679498
	- task:
	type: STS
	dataset:
	name: MTEB STS16 (default)
	type: mteb/sts16-sts
	config: default
	split: test
	revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
	metrics:
	- type: cosine_pearson
	value: 79.20106689077852
	- type: cosine_spearman
	value: 81.39570893867825
	- type: euclidean_pearson
	value: 80.39578888768929
	- type: euclidean_spearman
	value: 81.19950443340412
	- type: main_score
	value: 81.39570893867825
	- type: manhattan_pearson
	value: 80.2226679341839
	- type: manhattan_spearman
	value: 80.99142422593823
	- task:
	type: STS
	dataset:
	name: MTEB STS17 (ar-ar)
	type: mteb/sts17-crosslingual-sts
	config: ar-ar
	split: test
	revision: faeb762787bd10488a50c8b5be4a3b82e411949c
	metrics:
	- type: cosine_pearson
	value: 81.05294851623468
	- type: cosine_spearman
	value: 81.10570655134113
	- type: euclidean_pearson
	value: 79.22292773537778
	- type: euclidean_spearman
	value: 78.84204232638425
	- type: main_score
	value: 81.10570655134113
	- type: manhattan_pearson
	value: 79.43750460320484
	- type: manhattan_spearman
	value: 79.33713593557482
	- task:
	type: STS
	dataset:
	name: MTEB STS22 (ar)
	type: mteb/sts22-crosslingual-sts
	config: ar
	split: test
	revision: de9d86b3b84231dc21f76c7b7af1f28e2f57f6e3
	metrics:
	- type: cosine_pearson
	value: 45.96875498680092
	- type: cosine_spearman
	value: 52.405509117149904
	- type: euclidean_pearson
	value: 42.097450896728226
	- type: euclidean_spearman
	value: 50.89022884113707
	- type: main_score
	value: 52.405509117149904
	- type: manhattan_pearson
	value: 42.22827727075534
	- type: manhattan_spearman
	value: 50.912841055442634
	- task:
	type: STS
	dataset:
	name: MTEB STSBenchmark (default)
	type: mteb/stsbenchmark-sts
	config: default
	split: test
	revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
	metrics:
	- type: cosine_pearson
	value: 83.13261516884116
	- type: cosine_spearman
	value: 84.3492527221498
	- type: euclidean_pearson
	value: 82.691603178401
	- type: euclidean_spearman
	value: 83.0499566200785
	- type: main_score
	value: 84.3492527221498
	- type: manhattan_pearson
	value: 82.68307441014618
	- type: manhattan_spearman
	value: 83.01315787964519
	- task:
	type: Summarization
	dataset:
	name: MTEB SummEval (default)
	type: mteb/summeval
	config: default
	split: test
	revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
	metrics:
	- type: cosine_pearson
	value: 31.149232235402845
	- type: cosine_spearman
	value: 30.685504130606255
	- type: dot_pearson
	value: 27.466307571160375
	- type: dot_spearman
	value: 28.93064261485915
	- type: main_score
	value: 30.685504130606255
	- type: pearson
	value: 31.149232235402845
	- type: spearman
	value: 30.685504130606255
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts test 256
	type: sts-test-256
	metrics:
	- type: pearson_cosine
	value: 0.8264447022356382
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.8386403752382455
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.8219134931449013
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.825509659109493
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.8223094468630248
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.8260503151751462
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.6375226884845725
	name: Pearson Dot
	- type: spearman_dot
	value: 0.6287228614640888
	name: Spearman Dot
	- type: pearson_max
	value: 0.8264447022356382
	name: Pearson Max
	- type: spearman_max
	value: 0.8386403752382455
	name: Spearman Max
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts test 128
	type: sts-test-128
	metrics:
	- type: pearson_cosine
	value: 0.8209661910768973
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.8347149482673766
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.8082811559854036
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.8148314269262763
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.8093138512113149
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.8156468458613929
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.5795109620454884
	name: Pearson Dot
	- type: spearman_dot
	value: 0.5760223026552876
	name: Spearman Dot
	- type: pearson_max
	value: 0.8209661910768973
	name: Pearson Max
	- type: spearman_max
	value: 0.8347149482673766
	name: Spearman Max
	- task:
	type: semantic-similarity
	name: Semantic Similarity
	dataset:
	name: sts test 64
	type: sts-test-64
	metrics:
	- type: pearson_cosine
	value: 0.808708530451336
	name: Pearson Cosine
	- type: spearman_cosine
	value: 0.8217532539767914
	name: Spearman Cosine
	- type: pearson_manhattan
	value: 0.7876121380998453
	name: Pearson Manhattan
	- type: spearman_manhattan
	value: 0.7969092304137347
	name: Spearman Manhattan
	- type: pearson_euclidean
	value: 0.7902997966909958
	name: Pearson Euclidean
	- type: spearman_euclidean
	value: 0.7987635968785215
	name: Spearman Euclidean
	- type: pearson_dot
	value: 0.495047136234386
	name: Pearson Dot
	- type: spearman_dot
	value: 0.49287000679901516
	name: Spearman Dot
	- type: pearson_max
	value: 0.808708530451336
	name: Pearson Max
	- type: spearman_max
	value: 0.8217532539767914
	name: Spearman Max
	---

	# SentenceTransformer based on sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) on the Omartificial-Intelligence-Space/arabic-n_li-triplet dataset. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. This model is part of the [Arabic Matryoshka Embedding Models collection](https://huggingface.co/collections/Omartificial-Intelligence-Space/arabic-matryoshka-embedding-models-666f764d3b570f44d7f77d4e). It was presented in the paper [GATE: General Arabic Text Embedding for Enhanced Semantic Textual Similarity with Matryoshka Representation Learning and Hybrid Loss Training](https://huggingface.co/papers/2505.24581).

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2) <!-- at revision bf3bf13ab40c3157080a7ab344c831b9ad18b5eb -->
	- Maximum Sequence Length: 128 tokens
	- Output Dimensionality: 384 tokens
	- Similarity Function: Cosine Similarity
	- Training Dataset:
	- Omartificial-Intelligence-Space/arabic-n_li-triplet
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- **Hugging