|
--- |
|
license: mit |
|
language: |
|
- ar |
|
metrics: |
|
- accuracy |
|
- f1 |
|
- precision |
|
- recall |
|
library_name: transformers |
|
tags: |
|
- offensive language detection |
|
base_model: |
|
- UBC-NLP/MARBERT |
|
--- |
|
|
|
|
|
This model is part of the work done in <!-- add paper name -->. <br> |
|
The full code can be found at https://github.com/wetey/cluster-errors |
|
|
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
- **Model type:** BERT-based |
|
- **Language(s) (NLP):** Arabic |
|
- **Finetuned from model:** UBC-NLP/MARBERT |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
```python |
|
# Use a pipeline as a high-level helper |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-classification", model="wetey/MARBERT-LHSAB") |
|
|
|
``` |
|
|
|
```python |
|
# Load model directly |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("wetey/MARBERT-LHSAB") |
|
model = AutoModelForSequenceClassification.from_pretrained("wetey/MARBERT-LHSAB") |
|
|
|
``` |
|
|
|
## Fine-tuning Details |
|
|
|
### Fine-tuning Data |
|
|
|
This model is fine-tuned on the [L-HSAB](https://github.com/Hala-Mulki/L-HSAB-First-Arabic-Levantine-HateSpeech-Dataset). The exact version we use (after removing duplicates) can be found [](). <!--TODO--> |
|
|
|
### Fine-tuning Procedure |
|
|
|
The exact fine-tuning procedure followed can be found [here](https://github.com/wetey/cluster-errors/tree/master/finetuning) |
|
|
|
#### Training Hyperparameters |
|
|
|
evaluation_strategy = 'epoch' |
|
logging_steps = 1, |
|
num_train_epochs = 5, |
|
learning_rate = 1e-5, |
|
eval_accumulation_steps = 2 |
|
|
|
## Evaluation |
|
|
|
<!-- This section describes the evaluation protocols and provides the results. --> |
|
|
|
### Testing Data |
|
|
|
Test set used can be found [here](https://github.com/wetey/cluster-errors/tree/master/data/datasets) |
|
|
|
### Results |
|
|
|
`accuracy`: 87.9% <br> |
|
`precision`: 88.1% <br> |
|
`recall`: 87.9% <br> |
|
`f1-score`: 87.9% <br> |
|
|
|
#### Results per class |
|
| Label | Precision | Recall | F1-score| |
|
|---------|---------|---------|---------| |
|
| normal | 85% | 82% | 83% | |
|
| abusive | 93% | 92% | 93% | |
|
| hate | 68% | 78% | 72% | |
|
|
|
## Citation |
|
<!--TODO--> |
|
|