Arabic Text Scoring Regression Model

Model Description

This model is fine-tuned from AraELECTRA for the task of scoring Arabic text answers. It predicts a continuous score for a given Arabic text response.

Training Data

The model was trained on the AraScore dataset, which contains Arabic text answers with corresponding scores.

Metrics

The model achieves the following performance metrics:

MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
MAE (Mean Absolute Error)
R² (R-squared)

Usage

from transformers import AutoModelForSequenceClassification, AutoTokenizer
import torch
import re

# Load model and tokenizer
model_name = "kenzykhaled/arabic-answer-scoring"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Function to preprocess Arabic text
def preprocess_arabic_text(text):
    if not isinstance(text, str):
        return ""
    
    # Remove diacritics (تشكيل)
    text = re.sub(r'[ً-ٰٟ]', '', text)
    
    # Normalize Arabic letters
    text = re.sub('[إأآا]', 'ا', text)  # Normalize Alif forms
    text = re.sub('ى', 'ي', text)      # Normalize Yaa
    text = re.sub('ة', 'ه', text)      # Normalize Taa Marbouta
    
    # Remove non-Arabic characters except spaces
    text = re.sub(r'[^؀-ۿ\s]', '', text)
    
    # Remove extra spaces
    text = re.sub(r'\s+', ' ', text).strip()
    
    return text

# Define prediction function
def predict_score(text):
    # Preprocess and tokenize
    processed_text = preprocess_arabic_text(text)
    inputs = tokenizer(processed_text, return_tensors="pt", padding=True, truncation=True, max_length=256)
    
    # Move to appropriate device (GPU if available)
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    model.to(device)
    inputs = {k: v.to(device) for k, v in inputs.items()}
    
    # Predict
    model.eval()
    with torch.no_grad():
        outputs = model(**inputs)
        score = outputs.logits.item()
    
    return score

# Example usage
sample_text = "هذه إجابة نموذجية باللغة العربية."
score = predict_score(sample_text)
print(f"Predicted score: ")

Limitations

The model is optimized for educational answer scoring and may not perform well on other types of text.
The model works best with text similar to that in the training data.

Citation

If you use this model, please cite:

@misc{arabic-scoring-model,
  author = {Your Name},
  title = {Arabic Text Answer Scoring Model},
  year = {2025},
  publisher = {Hugging Face}
}