DarijaBERT Fine-Tuned for Sentiment Analysis π²π¦π§
This sentiment analysis model is based on DarijaBERT, a language model pretrained on Moroccan Arabic (Darija) text.
The model has been fine-tuned to classify Moroccan Arabic tweets and public comments into three sentiment categories:
- Positive (2)
- Neutral (0)
- Negative (1)
π Model Architecture
The base DarijaBERT architecture was extended with:
- Two fully connected layers of 1024 neurons each
- Dropout layer (p=0.3) to enhance generalization
- Final classification layer with 3 output neurons (one for each sentiment class)
π§ Pretraining Details
- Dataset: 17,441 Moroccan tweets
- 9,894 positive tweets (56.73%)
- 4,039 neutral tweets (23.16%)
- 3,508 negative tweets (20.11%)
- Training Framework: Hugging Face Trainer API
- Hyperparameters:
- Learning rate:
1e-5
- Batch size:
16
(with gradient accumulation = 32) - Weight decay:
0.01
- EarlyStoppingCallback: Training stopped automatically at 92% accuracy
- Epochs: Up to 20
- Learning rate:
- Evaluation Strategy: Evaluated after every epoch, best model saved
Performance:
- Accuracy: 87%
- F1 Score: 87%
- Precision: 88%
- Cohen's Kappa: 0.80
π₯ Fine-Tuning Details (Cash Transfer Public Policy 2023)
- Dataset: 1,344 Moroccan comments from YouTube and Hespress
- 515 neutral
- 505 negative
- 324 positive
- Split: 80% training / 20% testing
- Hyperparameters:
- Learning rate:
5e-6
- Batch size:
32
- Maximum sequence length:
256 tokens
- Warmup ratio:
0.1
- Early stopping enabled
- Class weights adjusted for imbalance
- Learning rate:
Performance:
- Accuracy: 91.6%
- Precision: 0.916
- Recall: 0.916
- F1 Score: 0.916
- Cohenβs Kappa: 0.872
π₯ How to Use the Model
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("monsifnadir/DarijaBERT-For-Sentiment-Analysis")
tokenizer = AutoTokenizer.from_pretrained("monsifnadir/DarijaBERT-For-Sentiment-Analysis")
text = "ΩΨ±ΨΨͺ Ψ¨Ψ²Ψ§Ω Ψ§ΩΩΩΩ
Ψ§ΩΨΩ
Ψ― ΩΩΩ"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = outputs.logits.argmax(dim=-1).item()
# Map prediction to label
label_map = {0: "Neutral", 1: "Negative", 2: "Positive"}
print("Predicted Sentiment:", label_map[predicted_class])
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support