Model Summary:
This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.
Dataset:
- The dataset used for training is derived from Egyptian law books.
- It consists of synthetic data generated using a Large Language Model (LLM).
- The dataset contains 20,252 samples, formatted as question-answer pairs.
Key Features:
- Vector Representation: 768-dimensional embeddings.
- Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
- Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).
This model is optimized for legal document retrieval and other NLP applications in Arabic.
- Downloads last month
- 930
Model tree for mohamed2811/Muffakir_Embedding
Base model
aubmindlab/bert-base-arabertv02