File size: 2,167 Bytes
cce0274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
# cardiffnlp/twitter-xlm-roberta-base-hate-spanish

This model is a fine-tuned version of [cardiffnlp/twitter-xlm-roberta-base](https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base) using the [`HaterNet`](https://zenodo.org/record/2592149) dataset and the Spanish subset of 
[`SemEval-2019 Task 5`](https://aclanthology.org/S19-2007/).

## Following metrics are achieved 

* `on the test split of SemEval-2019 Task 5`

  - F1 (weighted):  0.7866 
  - F1 (macro):  0.7935 
  - Accuracy:  0.7937

* on custom test split of `Haternet`

  - F1 (weighted): 0.7815  
  - F1 (macro):  0.6981 
  - Accuracy: 0.7933 

* on `Haternet` & `SemEval-2019 Task 5`
  - F1 (weighted): 0.7908
  - F1 (macro):  0.7657 
  - Accuracy: 0.7936



### Usage
Install tweetnlp via pip.
```shell
pip install tweetnlp
```
Load the model in python.
```python
import tweetnlp
model = tweetnlp.Classifier("cardiffnlp/twitter-xlm-roberta-base-hate-spanish")
model.predict('Ismael es egocentrico porque se vuelve loca si le dicen que tiene el pelo bonito๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚๐Ÿ˜‚ eso se define con otro objetivo #FirstDates251')
>> {'label': 'NOT-HATE'}

```



### Datasets
@inproceedings{basile-etal-2019-semeval,
    title = "{S}em{E}val-2019 Task 5: Multilingual Detection of Hate Speech Against Immigrants and Women in {T}witter",
    author = "Basile, Valerio  and
      Bosco, Cristina  and
      Fersini, Elisabetta  and
      Nozza, Debora  and
      Patti, Viviana  and
      Rangel Pardo, Francisco Manuel  and
      Rosso, Paolo  and
      Sanguinetti, Manuela",
    booktitle = "Proceedings of the 13th International Workshop on Semantic Evaluation",
    month = jun,
    year = "2019",
    address = "Minneapolis, Minnesota, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/S19-2007",
    doi = "10.18653/v1/S19-2007",
    pages = "54--63",
}

@article{quijano2019haternet,
  title={HaterNet a system for detecting and analyzing hate speech in Twitter (Version 1.0)[Data set]},
  author={Quijano-Sanchez, Lara and Kohatsu, Juan Carlos Pereira and Liberatore, Federico and Camacho-Collados, Miguel},
  journal={Zenodo},
  year={2019}
}