AraBERTv2+D3Tok+Reg Readability Model

Model description

AraBERTv2+D3Tok+Reg is a readability assessment model that was built by fine-tuning the AraBERTv2 model with Mean Squared Error loss (Reg). For the fine-tuning, we used the D3Tok input variant from BAREC-Corpus-v1.0. Our fine-tuning procedure and the hyperparameters we used can be found in our paper "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment."

Intended uses

You can use the AraBERTv2+D3Tok+Reg model as part of the transformers pipeline. You need to preprocess your text into the D3Tok input variant using the preprocessing step here.

How to use

To use the model:

from transformers import pipeline
readability = pipeline("text-classification", model="CAMeL-Lab/readability-arabertv2-d3tok-reg")
with open("/PATH/TO/preprocessed_d3tok", "r") as f:
    sentences = f.read().split("\n")
results = readability(sentences, function_to_apply="none")
readability_levels = [max(round(result['score']+0.5),1) for result in results]

Citation

@inproceedings{elmadani-etal-2025-readability,
    title = "A Large and Balanced Corpus for Fine-grained Arabic Readability Assessment",
    author = "Elmadani, Khalid N.  and
      Habash, Nizar  and
      Taha-Thomure, Hanada",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics"
}
Downloads last month
271
Safetensors
Model size
135M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CAMeL-Lab/readability-arabertv2-d3tok-reg

Finetuned
(59)
this model

Collection including CAMeL-Lab/readability-arabertv2-d3tok-reg