---
license: mit
datasets:
- ucberkeley-dlab/measuring-hate-speech
language:
- en
metrics:
- accuracy
- f1
- precision
- recall
library_name: transformers
pipeline_tag: text-classification
tags:
- 'offensive language detection '
base_model:
- distilbert/distilbert-base-uncased
---

This model is part of the work done in <!-- add paper name -->. <br>
The full code can be found at https://github.com/wetey/cluster-errors

## Model Details

### Model Description

- **Model type:** Distil-BERT
- **Language(s) (NLP):** English
- **Finetuned from model:** distilbert-base-uncased
## How to Get Started with the Model

Use the code below to get started with the model.

```python
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-classification", model="wetey/distilbert-base-uncased-measuring-hate-speech")

```

```python
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("wetey/distilbert-base-uncased-measuring-hate-speech")
model = AutoModelForSequenceClassification.from_pretrained("wetey/distilbert-base-uncased-measuring-hate-speech")

```

## Fine-tuning Details

###  Fine-tuning Data

The model was fine-tuned on the [ucberkeley-dlab/measuring-hate-speech](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech) dataset. <br>
We converted the continuous hatespeech scores to categorical labels using the ranges suggested by the authors. The ranges are listed on the [HuggingFace Dataset card](https://huggingface.co/datasets/ucberkeley-dlab/measuring-hate-speech). <br>
Examples with hatespeech scores that are lower than -1 are considered `supportive`, between -1 and 0.5 are `neutral`, and scores greater than 0.5 are `hatespeech`. <br>
We remove duplicate examples along with those that received fewer than three total annotations, and we drop the neutral class. <br>
After these steps, we were left with 12,289 examples with 7497 examples labeled as `supportive` and 4792 labeled as `hatespeech`. We use 85\% of the dataset for fine-tuning and 15\% for testing.

###  Fine-tuning Procedure

The exact fine-tuning procedure followed can be found [here](https://github.com/wetey/cluster-errors/tree/master/finetuning)

####  Fine-tuning Hyperparameters

    evaluation_strategy = 'epoch'
    logging_steps = 1,
    num_train_epochs = 5,
    learning_rate = 1e-5,
    eval_accumulation_steps = 2

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data

  Test set used can be found [here](https://github.com/wetey/cluster-errors/tree/master/data/datasets)

### Results

`accuracy`: 89.3% <br>
`precision`: 89.4% <br>
`recall`: 89.3% <br>
`f1-score`: 89.3% <br>
 #### Results per class 
| Label | Precision | Recall | F1-score|
|---------|---------|---------|---------|
| supportive | 92% | 91% | 91% |
| hatespeech| 86% | 87% | 86% |


## Citation
<!--TODO-->