README.md · wetey/MARBERT-LHSAB at main

metadata

license: mit
language:
  - ar
metrics:
  - accuracy
  - f1
  - precision
  - recall
library_name: transformers
tags:
  - offensive language detection
base_model:
  - UBC-NLP/MARBERT

This model is part of the work done in .
The full code can be found at https://github.com/wetey/cluster-errors

Model Details

Model Description

Model type: BERT-based
Language(s) (NLP): Arabic
Finetuned from model: UBC-NLP/MARBERT

How to Get Started with the Model

Use the code below to get started with the model.

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="wetey/MARBERT-LHSAB")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("wetey/MARBERT-LHSAB")
model = AutoModelForSequenceClassification.from_pretrained("wetey/MARBERT-LHSAB")

Fine-tuning Details

Fine-tuning Data

This model is fine-tuned on the L-HSAB. The exact version we use (after removing duplicates) can be found .

Fine-tuning Procedure

The exact fine-tuning procedure followed can be found here

Training Hyperparameters

evaluation_strategy = 'epoch'
logging_steps = 1,
num_train_epochs = 5,
learning_rate = 1e-5,
eval_accumulation_steps = 2

Evaluation

Testing Data

Test set used can be found here

Results

accuracy: 87.9%
precision: 88.1%
recall: 87.9%
f1-score: 87.9%

Results per class

Label	Precision	Recall	F1-score
normal	85%	82%	83%
abusive	93%	92%	93%
hate	68%	78%	72%

wetey
/

MARBERT-LHSAB