DarijaBERT Fine-Tuned for Sentiment Analysis πŸ‡²πŸ‡¦πŸ§ 

This sentiment analysis model is based on DarijaBERT, a language model pretrained on Moroccan Arabic (Darija) text.
The model has been fine-tuned to classify Moroccan Arabic tweets and public comments into three sentiment categories:

  • Positive (2)
  • Neutral (0)
  • Negative (1)

πŸ›  Model Architecture

The base DarijaBERT architecture was extended with:

  • Two fully connected layers of 1024 neurons each
  • Dropout layer (p=0.3) to enhance generalization
  • Final classification layer with 3 output neurons (one for each sentiment class)

🧠 Pretraining Details

  • Dataset: 17,441 Moroccan tweets
    • 9,894 positive tweets (56.73%)
    • 4,039 neutral tweets (23.16%)
    • 3,508 negative tweets (20.11%)
  • Training Framework: Hugging Face Trainer API
  • Hyperparameters:
    • Learning rate: 1e-5
    • Batch size: 16 (with gradient accumulation = 32)
    • Weight decay: 0.01
    • EarlyStoppingCallback: Training stopped automatically at 92% accuracy
    • Epochs: Up to 20
  • Evaluation Strategy: Evaluated after every epoch, best model saved

Performance:

  • Accuracy: 87%
  • F1 Score: 87%
  • Precision: 88%
  • Cohen's Kappa: 0.80

πŸ”₯ Fine-Tuning Details (Cash Transfer Public Policy 2023)

  • Dataset: 1,344 Moroccan comments from YouTube and Hespress
    • 515 neutral
    • 505 negative
    • 324 positive
  • Split: 80% training / 20% testing
  • Hyperparameters:
    • Learning rate: 5e-6
    • Batch size: 32
    • Maximum sequence length: 256 tokens
    • Warmup ratio: 0.1
    • Early stopping enabled
    • Class weights adjusted for imbalance

Performance:

  • Accuracy: 91.6%
  • Precision: 0.916
  • Recall: 0.916
  • F1 Score: 0.916
  • Cohen’s Kappa: 0.872

πŸ“₯ How to Use the Model

from transformers import AutoTokenizer, AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("monsifnadir/DarijaBERT-For-Sentiment-Analysis")
tokenizer = AutoTokenizer.from_pretrained("monsifnadir/DarijaBERT-For-Sentiment-Analysis")

text = "فرحΨͺ بزاف Ψ§Ω„ΩŠΩˆΩ… Ψ§Ω„Ψ­Ω…Ψ― Ω„Ω„Ω‡"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()

# Map prediction to label
label_map = {0: "Neutral", 1: "Negative", 2: "Positive"}
print("Predicted Sentiment:", label_map[predicted_class])
Downloads last month
2
Safetensors
Model size
147M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Space using monsifnadir/DarijaBERT-For-Sentiment-Analysis 1