File size: 4,219 Bytes

---
license: apache-2.0
language:
- ar
pipeline_tag: text-classification
datasets:
- labr
widget:
- text: من أفضل الكتب التي قرأتها في هذا العام
  example_title: Positive
- text: الكتاب سيء، لا أنصح أحد بقراءته أبدا
  example_title: Negative
- text: لا يمكنك الجزم بشيء حول هذا الكتاب
  example_title: Neutral
metrics:
- precision
- recall
- f1
library_name: transformers
tags:
- code
- sentiment analysis
- sentiment-analysis
---

# Introduction
This model predicts the sentiment of a text if it is Positive, Neutral, or Negative.
This model is a finetune version of [UBC-NLP/MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) on [labr](https://huggingface.co/datasets/labr).

# Data
The data used is [labr](https://huggingface.co/datasets/labr), an Arabic book reviews dataset.
The sentiment is obtained from the number of stars given by each review.

| Nubmer of stars | Sentiment |
|-----------------|-----------|
| 1-2             | Negative  |
| 3               | Neutral   |
| 4-5             | Positive  |

# Training
Using the Arabic Pre-Trained [MARBERTv2](https://huggingface.co/UBC-NLP/MARBERTv2) as a base, we finetuned the model for a classification task.
For 3 epochs, the training has been done using huggingface trainer on Google Colab.
This is a POC experiment, so the training hyper-parameters were not optimized.

# Evaluation
Using the test set from [labr](https://huggingface.co/datasets/labr), and the same preprocessing steps, the model was evaluated.
Please note the for the following results, we obtained the macro average.
| Metric | Score |
|-----------------|-----------|
| Precision      | 0.663  |
| Recall            | 0.662   |
| F1             | 0.66  |

# Using the model
To use the model in your code, follow huggingface instructions, or 
```python
from transformers import pipeline

pipe = pipeline("text-classification", model="AbdallahNasir/book-review-sentiment-classification")
result = pipe("من أفضل الكتب التي قرأتها في هذا العام")
print(result)
```

# Training code
Following this code, you will get the same results I got. You can run it in Google Colab. Please use a GPU runtime to finish the training quickly.

```python
# Notebook only:
!pip install transformers[torch] datasets

# Download and load the data
import datasets
dataset = datasets.load_dataset("labr")

# Transform the ratings into Sentiment
POSITIVE = "Positive"
NEUTRAL = "Neutral"
NEGATIVE = "Negative"
rate_to_sentiment = {0: NEGATIVE, 1: NEGATIVE, 2: NEUTRAL, 3: POSITIVE, 4: POSITIVE}
dataset = dataset.map(lambda example: {"sentiment": rate_to_sentiment[example["label"]]}, remove_columns=["label"])
dataset = dataset.rename_column("sentiment", "label")
class_names = [POSITIVE, NEUTRAL, NEGATIVE]  
num_classes = len(class_names)
dataset = dataset.cast_column('label', datasets.ClassLabel(num_classes=num_classes, names=class_names))

# Download and load the pre-trained model and tokenizer
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("UBC-NLP/MARBERTv2")
model = AutoModelForSequenceClassification.from_pretrained("UBC-NLP/MARBERTv2", num_labels=3)

# Tokenize data for training
def tokenize_function(examples):
  return tokenizer(examples["text"],  truncation=True, return_length=True,return_attention_mask=True, max_length=512)
tokenized_datasets = dataset.map(tokenize_function, batched=False, num_proc=16)

# Define data collator, useful for training and batching.
from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Defining training args
from transformers import TrainingArguments, Trainer
training_args = TrainingArguments("test-trainer", evaluation_strategy="epoch")

from transformers import Trainer
trainer = Trainer(
    model,
    training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    data_collator=data_collator,
    tokenizer=tokenizer,
)

# Train and save
trainer.train()
trainer.save_model("final_output")
```

##### Keywords
* sentiment analysis
* arabic
* book reviews