MARBERT-LHSAB / README.md
wetey's picture
Update README.md
a16a63c verified
metadata
license: mit
language:
  - ar
metrics:
  - accuracy
  - f1
  - precision
  - recall
library_name: transformers
tags:
  - offensive language detection
base_model:
  - UBC-NLP/MARBERT

This model is part of the work done in .
The full code can be found at https://github.com/wetey/cluster-errors

Model Details

Model Description

  • Model type: BERT-based
  • Language(s) (NLP): Arabic
  • Finetuned from model: UBC-NLP/MARBERT

How to Get Started with the Model

Use the code below to get started with the model.

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="wetey/MARBERT-LHSAB")
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("wetey/MARBERT-LHSAB")
model = AutoModelForSequenceClassification.from_pretrained("wetey/MARBERT-LHSAB")

Fine-tuning Details

Fine-tuning Data

This model is fine-tuned on the L-HSAB. The exact version we use (after removing duplicates) can be found .

Fine-tuning Procedure

The exact fine-tuning procedure followed can be found here

Training Hyperparameters

evaluation_strategy = 'epoch'
logging_steps = 1,
num_train_epochs = 5,
learning_rate = 1e-5,
eval_accumulation_steps = 2

Evaluation

Testing Data

Test set used can be found here

Results

accuracy: 87.9%
precision: 88.1%
recall: 87.9%
f1-score: 87.9%

Results per class

Label Precision Recall F1-score
normal 85% 82% 83%
abusive 93% 92% 93%
hate 68% 78% 72%

Citation