metadata
base_model: aubmindlab/bert-base-arabertv02
datasets:
- akhooli/arabic-triplets-1m-curated-sims-len
language:
- ar
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- transformers.js
- transformers
- sentence-similarity
- feature-extraction
- dataset_size:75000
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
- mteb
model-index:
- name: Omartificial-Intelligence-Space/Arabert-matro-v4
results:
- dataset:
config: ar-ar
name: MTEB STS17 (ar-ar)
revision: faeb762787bd10488a50c8b5be4a3b82e411949c
split: test
type: mteb/sts17-crosslingual-sts
metrics:
- type: cosine_pearson
value: 84.66883392015258
- type: cosine_spearman
value: 85.30520907141938
- type: euclidean_pearson
value: 82.04306779342852
- type: euclidean_spearman
value: 84.58744201847996
- type: main_score
value: 85.30520907141938
- type: manhattan_pearson
value: 82.08829357724328
- type: manhattan_spearman
value: 84.49254541383544
task:
type: STS
license: apache-2.0
Arabic-Triplet-Matryoshka-V2-Model
This is a sentence-transformers model finetuned from aubmindlab/bert-base-arabertv02.
It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model is trained on 1M samples from the akhooli/arabic-triplets-1m-curated-sims-len dataset.
Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss).
## Citation
If you use the Arabic Matryoshka Embeddings Model, please cite it as follows:
@misc{nacar2024enhancingsemanticsimilarityunderstanding,
title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning},
author={Omer Nacar and Anis Koubaa},
year={2024},
eprint={2407.21139},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2407.21139},
}