Model Summary:

This model is a Sentence Transformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2, fine-tuned for semantic textual similarity and information retrieval tasks. It maps sentences to dense vector representations for tasks like search, clustering, and text classification.

Dataset:

  • The dataset used for training is derived from Egyptian law books.
  • It consists of synthetic data generated using a Large Language Model (LLM).
  • The dataset contains 20,252 samples, formatted as question-answer pairs.

Key Features:

  • Vector Representation: 768-dimensional embeddings.
  • Training Loss: MatryoshkaLoss & MultipleNegativesRankingLoss.
  • Evaluation Metrics: Cosine similarity-based metrics (Accuracy, Precision, Recall, NDCG).

This model is optimized for legal document retrieval and other NLP applications in Arabic.

Downloads last month
930
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW

Model tree for mohamed2811/Muffakir_Embedding