Model Card for hossam87/bert-base-arabic-hate-speech
A fine-tuned BERT model to classify Arabic text into: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.
Model Details
Model Description
This model is based on bert-base-multilingual-cased
and fine-tuned on an Arabic social media dataset for hate speech detection.
It classifies Arabic text into one of five categories: Neutral, Offensive, Sexism, Religious Discrimination, or Racism.
Intended uses include moderation, analytics, and academic research.
- Developed by: hossam87
- Model type: Sequence classification (BERT)
- Language(s): Arabic (ar)
- License: MIT
- Finetuned from model: bert-base-multilingual-cased
Model Sources
- Repository: https://huggingface.co/hossam87/bert-base-arabic-hate-speech
- Demo: https://huggingface.co/spaces/hossam87/arabic-hate-speech-detector
Training Details
Training Data
The model was fine-tuned on a labeled dataset of Arabic social media posts, manually annotated for the five target categories.
Training Procedure
- Precision: Mixed precision (
fp16
) - Epochs: 4 (best model at epoch 3)
- Batch size: 32
- Learning rate: 3e-5
- Optimizer: AdamW
- Hardware: 2 x NVIDIA T4 GPUs (Kaggle)
Evaluation
Metrics
Metric | Score |
---|---|
Accuracy | 0.944 |
F1 Macro | 0.946 |
Uses
Direct Use
- Content moderation for Arabic social media, forums, and chats.
- Analytics and research into hate speech patterns in Arabic.
- Educational and academic projects.
Out-of-Scope Use
- Automated moderation without human oversight in sensitive or legal contexts.
- Use on languages other than Arabic.
- General text classification tasks outside hate speech detection.
Bias, Risks, and Limitations
The model may misclassify:
- Sarcasm, slang, or context-dependent expressions.
- Formal written Arabic, since trained on social media content.
- Domain-specific or emerging hate speech not represented in the training data.
Recommendations
Always keep a human-in-the-loop for sensitive moderation tasks. Use responsibly and be transparent about automation.
How to Get Started with the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_id = "hossam87/bert-base-arabic-hate-speech"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
classifier = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "هذا نص عربي للاختبار"
result = classifier(text)
print(result)
@misc{hossam87_2025_arabichate,
title = {BERT-base Arabic Hate Speech Detector},
author = {Hossam87},
year = {2025},
howpublished = {\url{https://huggingface.co/hossam87/bert-base-arabic-hate-speech}},
}
- Downloads last month
- 11
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support