---
license: mit
language:
- ar
metrics:
- accuracy
- f1
- precision
- recall
library_name: transformers
tags:
- offensive language detection
base_model:
- UBC-NLP/MARBERT
---
This model is part of the work done in .
The full code can be found at https://github.com/wetey/cluster-errors
## Model Details
### Model Description
- **Model type:** BERT-based
- **Language(s) (NLP):** Arabic
- **Finetuned from model:** UBC-NLP/MARBERT
## How to Get Started with the Model
Use the code below to get started with the model.
```python
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="wetey/MARBERT-LHSAB")
```
```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("wetey/MARBERT-LHSAB")
model = AutoModelForSequenceClassification.from_pretrained("wetey/MARBERT-LHSAB")
```
## Fine-tuning Details
### Fine-tuning Data
This model is fine-tuned on the [L-HSAB](https://github.com/Hala-Mulki/L-HSAB-First-Arabic-Levantine-HateSpeech-Dataset). The exact version we use (after removing duplicates) can be found []().
### Fine-tuning Procedure
The exact fine-tuning procedure followed can be found [here](https://github.com/wetey/cluster-errors/tree/master/finetuning)
#### Training Hyperparameters
evaluation_strategy = 'epoch'
logging_steps = 1,
num_train_epochs = 5,
learning_rate = 1e-5,
eval_accumulation_steps = 2
## Evaluation
### Testing Data
Test set used can be found [here](https://github.com/wetey/cluster-errors/tree/master/data/datasets)
### Results
`accuracy`: 87.9%
`precision`: 88.1%
`recall`: 87.9%
`f1-score`: 87.9%
#### Results per class
| Label | Precision | Recall | F1-score|
|---------|---------|---------|---------|
| normal | 85% | 82% | 83% |
| abusive | 93% | 92% | 93% |
| hate | 68% | 78% | 72% |
## Citation