Update README.md

cc3cb85 verified 3 days ago

6.23 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- loss:MultipleNegativesRankingLoss
	- mteb
	base_model: NAMAA-Space/AraModernBert-Base-V1.0
	widget:
	- source_sentence: الذكاء الاصطناعي يغير طريقة تفاعلنا مع التكنولوجيا.
	sentences:
	- التكنولوجيا تتطور بسرعة بفضل الذكاء الاصطناعي.
	- الذكاء الاصطناعي يسهم في تطوير التطبيقات الذكية.
	- تحديات الذكاء الاصطناعي تشمل الحفاظ على الأمان والأخلاقيات.
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- pearson_cosine
	- spearman_cosine
	model-index:
	- name: NAMAA-Space/AraModernBert-Base-V1.0
	results:
	- dataset:
	config: ar-ar
	name: MTEB STS17 (ar-ar)
	revision: faeb762787bd10488a50c8b5be4a3b82e411949c
	split: test
	type: mteb/sts17-crosslingual-sts
	metrics:
	- type: pearson
	value: 82.4888
	- type: spearman
	value: 83.0981
	- type: cosine_pearson
	value: 82.4888
	- type: cosine_spearman
	value: 83.1109
	- type: manhattan_pearson
	value: 81.2051
	- type: manhattan_spearman
	value: 83.0197
	- type: euclidean_pearson
	value: 81.1013
	- type: euclidean_spearman
	value: 82.8922
	- type: main_score
	value: 83.1109
	task:
	type: STS
	- dataset:
	config: ar
	name: MTEB STS22.v2 (ar)
	revision: d31f33a128469b20e357535c39b82fb3c3f6f2bd
	split: test
	type: mteb/sts22-crosslingual-sts
	metrics:
	- type: pearson
	value: 52.58540000000001
	- type: spearman
	value: 61.7371
	- type: cosine_pearson
	value: 52.58540000000001
	- type: cosine_spearman
	value: 61.7371
	- type: manhattan_pearson
	value: 55.887299999999996
	- type: manhattan_spearman
	value: 61.3654
	- type: euclidean_pearson
	value: 55.633500000000005
	- type: euclidean_spearman
	value: 61.2124
	- type: main_score
	value: 61.7371
	task:
	type: STS
	license: apache-2.0
	language:
	- ar
	---

	# SentenceTransformer based on NAMAA-Space/AraModernBert-Base-V1.0

	This SentenceTransformer is fine-tuned from [NAMAA-Space/AraModernBert-Base-V1.0](https://huggingface.co/NAMAA-Space/AraModernBert-Base-V1.0), bringing strong arabic embeddings useful for a multiple of use cases.

	🔹 768-dimensional dense vectors 🎯
	🔹 Excels in: Semantic Similarity, Search, Paraphrase Mining, Clustering, Text Classification & More!
	🔹 Optimized for speed & efficiency without sacrificing performance

	Whether you're building intelligent search engines, chatbots, or AI-powered knowledge graphs, this model delivers meaningful representations of Arabic text with precision and depth.

	Try it out & bring Arabic NLP to the next level! 🔥✨

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
	(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("NAMAA-Space/AraModernBert-Base-STS")
	# Run inference
	sentences = [
	'الذكاء الاصطناعي يغير طريقة تفاعلنا مع التكنولوجيا.',
	'التكنولوجيا تتطور بسرعة بفضل الذكاء الاصطناعي.',
	'الذكاء الاصطناعي يسهم في تطوير التطبيقات الذكية.',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 768]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	## Evaluation

	### Metrics

	#### Semantic Similarity

	* Datasets: ` STS17` and `STS22.v2`
	* Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)

	\| Metric \| STS17 \| STS22.v2 \|
	\|:--------------------\|:----------\|:-----------\|
	\| pearson_cosine \| 0.8249 \| 0.5259 \|
	\| spearman_cosine \| 0.831 \| 0.6169 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	### Framework Versions
	- Python: 3.10.12
	- Sentence Transformers: 3.4.1
	- Transformers: 4.49.0
	- PyTorch: 2.1.0+cu118
	- Accelerate: 1.4.0
	- Datasets: 2.21.0
	- Tokenizers: 0.21.0

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```