Cross-Encoder for Persian Scientific Relevance Ranking

This is a cross-encoder model based on xlm-roberta-large that has been fine-tuned for relevance ranking of Persian scientific texts. It takes a question and a document (an abstract) as input and outputs a score from 0 to 1 indicating their relevance.

This model was trained as a reranker for a Persian scientific Question Answering system.

Model Details

Base Model: xlm-roberta-large
Task: Reranking / Sentence Similarity
Fine-tuning Framework: sentence-transformers
Language: Persian (fa)

Intended Use

The primary use of this model is to act as a reranker in a search or question-answering pipeline. Given a user's query and a list of candidate documents retrieved by a faster first-stage model (like BM25 or a bi-encoder), this cross-encoder can re-score the top candidates to provide a more accurate final ranking.

How to Use

To use the model, first install the sentence-transformers library:

pip install -U sentence-transformers
from sentence_transformers import CrossEncoder

# Load the model from the Hugging Face Hub
model_name = 'YOUR_HF_USERNAME/reranker-xlm-roberta-large' #<-- IMPORTANT: Replace with your model name!
model = CrossEncoder(model_name)

# Prepare your query and document pairs
query = "روش های ارزیابی در بازیابی اطلاعات چیست؟" # "What are the evaluation methods in information retrieval?"
documents = [
    "بازیابی اطلاعات یک فرآیند پیچیده است که شامل شاخص گذاری و جستجوی اسناد می شود. ارزیابی آن اغلب با معیارهایی مانند دقت و بازیابی انجام می شود.", # "Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall."
    "یادگیری عمیق در سال های اخیر پیشرفت های چشمگیری در پردازش زبان طبیعی داشته است.", # "Deep learning has made significant progress in natural language processing in recent years."
    "این مقاله به بررسی روش های جدید برای ارزیابی سیستم های بازیابی اطلاعات معنایی می پردازد و معیارهای نوینی را معرفی می کند." # "This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics."
]

# Create pairs for scoring
sentence_pairs = [[query, doc] for doc in documents]

# Predict the scores
scores = model.predict(sentence_pairs, convert_to_numpy=True)

# Print results
for i in range(len(scores)):
    print(f"Score: {scores[i]:.4f}\t Document: {documents[i]}")

# Expected output (scores will vary but should follow this trend):
# Score: 0.9123    Document: This paper examines new methods for evaluating semantic information retrieval systems and introduces novel metrics.
# Score: 0.7543    Document: Information retrieval is a complex process involving indexing and searching documents. Its evaluation is often done with metrics like precision and recall.
# Score: 0.0123    Document: Deep learning has made significant progress in natural language processing in recent years.
This model was fine-tuned on the 

PersianSciQA dataset. 




Description: PersianSciQA is a large-scale dataset containing 39,809 Persian scientific question-answer pairs. It was generated using a two-stage process with 



gpt-4o-mini on a corpus of scientific abstracts from IranDoc's 'Ganj' repository. 





Content: The dataset consists of questions paired with scientific abstracts, primarily from engineering fields. 



Labels: Each pair has a relevance score from 0 (Not Relevant) to 3 (Highly Relevant), which was normalized to a 0-1 float for training. 


Training Procedure
The model was trained using the provided train_reranker.py script with the following configuration:

Epochs: 2

Batch Size: 16

Learning Rate: 2e-5

Loss Function: MSELoss (default for regression in sentence-transformers)

Evaluator: CECorrelationEvaluator was used to save the best model based on Spearman's rank correlation on the validation set.

Evaluation
The 

PersianSciQA paper reports substantial agreement between the LLM-assigned labels used for training and human expert judgments (Cohen's Kappa of 0.6642). The human validation study confirmed the high quality of the generated questions (88.60% acceptable) and the relevance assessments. 



Citation
If you use this model or the PersianSciQA dataset in your research, please cite the original paper.

(Note: The provided paper is a pre-print. Please update the citation information once it is officially published.)

@inproceedings{PersianSciQA2025,
  title={PersianSciQA: A new Dataset for Bridging the Language Gap in Scientific Question Answering},
  author={Anonymous},
  year={2025},
  booktitle={Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP)},
  note={Confidential review copy. To be updated upon publication.}
}