This is a finetuned version of the MARBERTv2 transformer for arabic dialect classification using the QADI dataset. The model classifies input text into one of the 18 Arabic country level dialects across the Arabic region. For the 5 dialects version click here: https://huggingface.co/oahmedd/MARBERTv2-Finetuned-on-5-Dialects-QADI
Dataset:
QADI (Qatar Arabic Dialect Identification
440,000 tweets
https://huggingface.co/datasets/Abdelrahman-Rezk/Arabic_Dialect_Identification by Abdelrahman Rezk et al.
Covers 18 Arabic dialects
Evaluation:
Metrics: 65% Accuracy & F1-score
Usage:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")
tokenizer = AutoTokenizer.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")
For Dialect classification Inference:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
DIALECT_LABELS = [
"Omani", "Sudanese", "Saudi", "Kuwaiti", "Qatari", "Lebanese", "Jordanian",
"Syrian", "Iraqi", "Moroccan", "Egyptian", "Palestinian", "Yemeni", "Bahraini",
"Algerian", "Emirati", "Tunisian", "Libyan"
]
model = AutoModelForSequenceClassification.from_pretrained(
"oahmedd/MARBERTv2-Finetuned-on-QADI-dataset",
num_labels=18
)
tokenizer = AutoTokenizer.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")
model.eval()
text = "ازيك يصاحبي عامل ايه، ايه الاخبار"
inputs = tokenizer(
text,
return_tensors="pt",
truncation=True,
padding=True,
)
with torch.inference_mode():
logits = model(**inputs).logits
prediction = torch.argmax(logits, dim=1).item()
predicted_dialect = DIALECT_LABELS[prediction]
print(f"Predicted Dialect: {predicted_dialect}")
Citation:
If you find this helpful, please cite our work.
@article{essameldin2025arabic,
title={Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis},
author={Essameldin, Omar A and Elbeih, Ali O and Gomaa, Wael H and Elsersy, Wael F},
journal={arXiv preprint arXiv:2506.19753},
year={2025}
}
- Downloads last month
- 26
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for oahmedd/MARBERTv2-Finetuned-on-QADI-dataset
Base model
UBC-NLP/MARBERTv2