LGBeTO_detection_Model
This is LGBeTO model. Corresponding to a fine-tuned version of dccuchile/bert-base-spanish-wwm-uncased(Cañete et al., 2023). It achieves the following results on the evaluation set:
- Accuracy: 0.835
- F1: 0.8533
- Precision: 0.8205
- Recall: 0.8889
Authors
- Developed by: Claudia Martínez-Araneda, Mariella Gutiérrez V., Pedro Gómez M., Diego Maldonado M., Alejandra Segura N., Christian Vidal-Castro
- Model type: BERT-based sentiment analysis, BERT-based text classification.
- Language(s) (NLP): Spanish
- License: CC BY 4.0
- Finetuned from model: BETO (Cañete et al., 2023)
Cite as:
@misc{claudia_martínez-araneda_2025, author = { Claudia Martínez-Araneda and Mariella Gutiérrez V. and Pedro Gómez M. and Diego Maldonado M. and Alejandra Segura N. and Christian Vidal-Castro }, title = { LGBeTO_detection_Model (Revision a8b5b38) }, year = 2025, url = { https://huggingface.co/LaProfeClaudis/LGBeTO_detection_Model }, doi = { 10.57967/hf/5406 }, publisher = { Hugging Face } }
Model description
LGBeTO was designed to detect discriminatory or hateful language directed toward the LGBTQIA+ community, aiming to support safer and more inclusive online environments.
Intended uses & limitations
This model was created for a study conducted strictly for academic and research purposes. The target of hate speech has been anonymised, and there is no intent to harm the perpetrators in any way. We prioritise protecting the privacy and confidentiality of vulnerable individuals. We carefully remove identifying data, such as user IDs, phone numbers, and addresses, to safeguard privacy before sharing the data with our annotators. All data collected comes from public sources.
As authors, we affirm our deep respect for all individuals and explicitly state that we have no intention of prejudicing, biasing, or disrespecting the LGBTQIA+ community or any group. Our work seeks to contribute constructively to inclusive and ethical research in artificial intelligence.
Training and evaluation data
LGBeTO was fine-tuned using comments collected from digital media, such as Twitter, Instagram, websites, and YouTube comments. The dataset is available in the Zenodo Repository.
Cite as: Martínez-Araneda, C., Maldonado Montiel, D., Gutiérrez Valenzuela, M., Gómez Meneses, P., Segura Navarrete, A., & Vidal-Castro, C. (2025). LGBTQIAphobia dataset (augmented and balanced) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.15385622
Training procedure
- step 1: Load the dataSet
- step 2: Tokenization and model generation
- step 3: Split train-validation
- step 4: Training configuration
- step 5: Training/Evaluation
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- num_epochs: 3
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | Precision | Recall |
---|---|---|---|---|---|---|---|
0.4655 | 1.0 | 50 | 0.5517 | 0.755 | 0.7538 | 0.8242 | 0.6944 |
0.1928 | 2.0 | 100 | 0.4830 | 0.825 | 0.8523 | 0.7829 | 0.9352 |
0.0718 | 3.0 | 150 | 0.5393 | 0.835 | 0.8533 | 0.8205 | 0.8889 |
Framework versions
- Transformers 4.51.3
- Pytorch 2.6.0+cu124
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- 0
Model tree for LaProfeClaudis/LGBeTO_detection_Model
Base model
dccuchile/bert-base-spanish-wwm-uncased