maux-gte-persian-v3 (fp16)

A high-performance Persian sentence embedding model based on Alibaba-NLP/gte-multilingual-base, released in fp16 for efficient inference.

Model Overview

This is the fp16 (half-precision) version of maux-gte-persian-v3, a Sentence Transformers model fine-tuned from Alibaba-NLP/gte-multilingual-base for robust Persian sentence and paragraph embeddings.
The fp16 format enables faster and more memory-efficient inference, especially on modern GPUs.

Key Features:

Base Model: Alibaba-NLP/gte-multilingual-base
Fine-tuned on: mshojaei77/Persian_sft (80,000 Persian sentence pairs)
Output Dimension: 768
Max Sequence Length: 8192 tokens
Similarity Function: Cosine Similarity
Loss Function: MultipleNegativesRankingLoss
Format: fp16 (model.safetensors)

Performance

Excellent performance on Persian semantic similarity, search, and clustering tasks.
Outperforms or matches jinaai-v3 in most Persian benchmarks (see comparison charts).
Efficient for large-scale inference due to fp16 weights.

Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: NewModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, ...})
  (2): Normalize()
)

Usage

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("xmanii/maux-gte-persian-v3-fp16", trust_remote_code=True)
sentences = [
    'برج میلاد در تهران هست',
    'یکی از برج های مسکونی تهران برج تهران است',
    'تهران برج های زیادی دارد'
]
embeddings = model.encode(sentences)
print(embeddings.shape)  # [3, 768]

# Compute cosine similarity
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)  # [3, 3]

Training Details

Dataset: mshojaei77/Persian_sft
Loss: MultipleNegativesRankingLoss (scale=20.0, similarity_fct="cos_sim")
Batch size: 64
Precision: bf16 during training, fp16 for this release
Frameworks: Python 3.10, Sentence Transformers 4.1.0, Transformers 4.51.3, PyTorch 2.7.0+cu126

Files

model.safetensors (fp16 weights)
All necessary config and tokenizer files
Custom code: modeling.py, configuration.py (required for loading)

Citation

If you use this model, please cite:

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

Acknowledgements

Special thanks to mshojaei77 for the Persian_sft dataset.
Built on top of Alibaba-NLP/gte-multilingual-base.

License

This model is distributed under the same license as the base model and dataset.

For questions or feedback, please open an issue or discussion on the Hugging Face model page.

xmanii
/

maux-gte-persian-v3-fp16