Text Classification
Transformers
Safetensors
Arabic
bert

This is a finetuned version of the MARBERTv2 transformer for arabic dialect classification using the QADI dataset. The model classifies input text into one of the 18 Arabic country level dialects across the Arabic region. For the 5 dialects version click here: https://huggingface.co/oahmedd/MARBERTv2-Finetuned-on-5-Dialects-QADI

Dataset:

QADI (Qatar Arabic Dialect Identification

440,000 tweets

https://huggingface.co/datasets/Abdelrahman-Rezk/Arabic_Dialect_Identification by Abdelrahman Rezk et al.

Covers 18 Arabic dialects

Evaluation:

Metrics: 65% Accuracy & F1-score

Usage:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")
tokenizer = AutoTokenizer.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")

For Dialect classification Inference:

from transformers import AutoTokenizer, AutoModelForSequenceClassification

DIALECT_LABELS = [
    "Omani", "Sudanese", "Saudi", "Kuwaiti", "Qatari", "Lebanese", "Jordanian",
    "Syrian", "Iraqi", "Moroccan", "Egyptian", "Palestinian", "Yemeni", "Bahraini",
    "Algerian", "Emirati", "Tunisian", "Libyan"
]

model = AutoModelForSequenceClassification.from_pretrained(
  "oahmedd/MARBERTv2-Finetuned-on-QADI-dataset",
  num_labels=18
)
tokenizer = AutoTokenizer.from_pretrained("oahmedd/MARBERTv2-Finetuned-on-QADI-dataset")

model.eval()

 text = "ازيك يصاحبي عامل ايه، ايه الاخبار"

inputs = tokenizer(
  text,
  return_tensors="pt",
  truncation=True,
  padding=True,
)

with torch.inference_mode():
  logits = model(**inputs).logits
  prediction = torch.argmax(logits, dim=1).item()
predicted_dialect = DIALECT_LABELS[prediction]

print(f"Predicted Dialect: {predicted_dialect}")

Citation:

If you find this helpful, please cite our work.

@article{essameldin2025arabic,
  title={Arabic Dialect Classification using RNNs, Transformers, and Large Language Models: A Comparative Analysis},
  author={Essameldin, Omar A and Elbeih, Ali O and Gomaa, Wael H and Elsersy, Wael F},
  journal={arXiv preprint arXiv:2506.19753},
  year={2025}
}
Downloads last month
26
Safetensors
Model size
163M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for oahmedd/MARBERTv2-Finetuned-on-QADI-dataset

Base model

UBC-NLP/MARBERTv2
Finetuned
(21)
this model

Dataset used to train oahmedd/MARBERTv2-Finetuned-on-QADI-dataset