metadata

base_model: aubmindlab/bert-base-arabertv02
datasets:
  - akhooli/arabic-triplets-1m-curated-sims-len
language:
  - ar
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - transformers.js
  - transformers
  - sentence-similarity
  - feature-extraction
  - dataset_size:75000
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
  - mteb
model-index:
  - name: Omartificial-Intelligence-Space/Arabert-matro-v4
    results:
      - dataset:
          config: ar-ar
          name: MTEB STS17 (ar-ar)
          revision: faeb762787bd10488a50c8b5be4a3b82e411949c
          split: test
          type: mteb/sts17-crosslingual-sts
        metrics:
          - type: cosine_pearson
            value: 84.66883392015258
          - type: cosine_spearman
            value: 85.30520907141938
          - type: euclidean_pearson
            value: 82.04306779342852
          - type: euclidean_spearman
            value: 84.58744201847996
          - type: main_score
            value: 85.30520907141938
          - type: manhattan_pearson
            value: 82.08829357724328
          - type: manhattan_spearman
            value: 84.49254541383544
        task:
          type: STS
license: apache-2.0

Arabic-Triplet-Matryoshka-V2-Model

This is a sentence-transformers model finetuned from aubmindlab/bert-base-arabertv02.
It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
This model is trained on 1M samples from the akhooli/arabic-triplets-1m-curated-sims-len dataset.
Trained for 3 epochs, with final training loss of 0.718 (using MatryoshkaLoss).

## Citation

If you use the Arabic Matryoshka Embeddings Model, please cite it as follows:

@misc{nacar2024enhancingsemanticsimilarityunderstanding,
      title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning}, 
      author={Omer Nacar and Anis Koubaa},
      year={2024},
      eprint={2407.21139},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.21139}, 
}