--- license: mit language: - ar metrics: - accuracy - f1 - precision - recall library_name: transformers tags: - offensive language detection base_model: - UBC-NLP/MARBERT --- This model is part of the work done in .
The full code can be found at https://github.com/wetey/cluster-errors ## Model Details ### Model Description - **Model type:** BERT-based - **Language(s) (NLP):** Arabic - **Finetuned from model:** UBC-NLP/MARBERT ## How to Get Started with the Model Use the code below to get started with the model. ```python # Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="wetey/MARBERT-LHSAB") ``` ```python # Load model directly from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("wetey/MARBERT-LHSAB") model = AutoModelForSequenceClassification.from_pretrained("wetey/MARBERT-LHSAB") ``` ## Fine-tuning Details ### Fine-tuning Data This model is fine-tuned on the [L-HSAB](https://github.com/Hala-Mulki/L-HSAB-First-Arabic-Levantine-HateSpeech-Dataset). The exact version we use (after removing duplicates) can be found [](). ### Fine-tuning Procedure The exact fine-tuning procedure followed can be found [here](https://github.com/wetey/cluster-errors/tree/master/finetuning) #### Training Hyperparameters evaluation_strategy = 'epoch' logging_steps = 1, num_train_epochs = 5, learning_rate = 1e-5, eval_accumulation_steps = 2 ## Evaluation ### Testing Data Test set used can be found [here](https://github.com/wetey/cluster-errors/tree/master/data/datasets) ### Results `accuracy`: 87.9%
`precision`: 88.1%
`recall`: 87.9%
`f1-score`: 87.9%
#### Results per class | Label | Precision | Recall | F1-score| |---------|---------|---------|---------| | normal | 85% | 82% | 83% | | abusive | 93% | 92% | 93% | | hate | 68% | 78% | 72% | ## Citation