Model Card for hossam87/bert-base-arabic-hate-speech

A fine-tuned BERT model to classify Arabic text into: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.


Model Details

Model Description

This model is based on bert-base-multilingual-cased and fine-tuned on an Arabic social media dataset for hate speech detection.
It classifies Arabic text into one of five categories: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.
Intended uses include moderation, analytics, and academic research.

Model Sources

Training Details

Training Data

The model was fine-tuned on a labeled dataset of Arabic social media posts, manually annotated for the five target categories.

Training Procedure

  • Precision: Mixed precision (fp16)
  • Epochs: 4 (best model at epoch 3)
  • Batch size: 32
  • Learning rate: 3e-5
  • Optimizer: AdamW
  • Hardware: 2 x NVIDIA T4 GPUs (Kaggle)

Evaluation

Metrics

Metric Score
Accuracy 0.944
F1 Macro 0.946

Uses

Direct Use

  • Content moderation for Arabic social media, forums, and chats.
  • Analytics and research into hate speech patterns in Arabic.
  • Educational and academic projects.

Out-of-Scope Use

  • Automated moderation without human oversight in sensitive or legal contexts.
  • Use on languages other than Arabic.
  • General text classification tasks outside hate speech detection.

Bias, Risks, and Limitations

The model may misclassify:

  • Sarcasm, slang, or context-dependent expressions.
  • Formal written Arabic, since trained on social media content.
  • Domain-specific or emerging hate speech not represented in the training data.

Recommendations

Always keep a human-in-the-loop for sensitive moderation tasks. Use responsibly and be transparent about automation.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)

classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
@misc{hossam87_2025_arabichate,
  title = {BERT-base Arabic Hate Speech Detector},
  author = {Hossam87},
  year = {2025},
  howpublished = {\url{https://huggingface.co/hossam87/bert-base-arabic-hate-speech}},
}
Downloads last month
11
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hossam87/bert-base-arabic-hate-speech

Finetuned
(4)
this model

Dataset used to train hossam87/bert-base-arabic-hate-speech

Space using hossam87/bert-base-arabic-hate-speech 1