This model is a fine-tuned version of xlm-roberta-large
designed for binary toxicity classification in multilingual contexts.
It predicts:
- 0 → Non-toxic
- 1 → Toxic
The model was optimized through over 50 hyperparameter experiments and rigorously benchmarked against strong public baselines. It supports multilingual input, making it ideal for real-world, globally-distributed moderation tasks.
1. Training Details
- Base Model:
FacebookAI/xlm-roberta-large
- Task: Sequence classification (Binary)
- Loss: Cross-entropy with class weights
- Datasets: Combination of multiple multilingual and toxicity datasets (details below)
- Training Epochs: 10 (with early stopping)
- Eval Metric: Best model selected based on weighted precision
- Optimized Hyperparameters: Learning rate, warmup ratio, weight decay, batch size & gradient accumulation
Thanks for the clarification! Here's an updated version that accurately describes your full tuning process across multiple grids and your staged sampling strategy:
2. Hyperparameter Search
More than 50 experiments were conducted using an iterative grid refinement strategy. Instead of relying on a single hyperparameter grid, multiple evolving grids were explored over time. The grid shown below represents only the final stage of tuning:
learning_rates = [1e-5, 1.5e-5, 2e-5]
warmup_ratios = [0.15, 0.2, 0.25]
weight_decays = [0.01, 0.02, 0.03]
batch_configs = [(16, 2), (16, 4)] # (batch_size, gradient_accumulation_steps)
Initially, ~10% of the combinations from early-stage grids were sampled. Based on the best and worst performers, both the grid ranges and model parameters were dynamically adjusted. This process continued iteratively until reaching the final grid above, from which a larger sample (around 50% of combinations) was evaluated in-depth.
This adaptive tuning process allowed for efficient convergence toward high-performing configurations while reducing computational waste on suboptimal regions of the search space.
3. Evaluation & Benchmarks
Benchmark #1: Combined Dataset
The following subsets of public datasets were merged for model evaluation:
Dataset | Purpose | Subset Details |
---|---|---|
ToxiGen - Annotated | Toxic / Non-toxic labels | Used the 'annotated' subset. Only included samples where toxicity_human ≥ 4 (toxic) or ≤ 2 (non-toxic). |
TextDetox Multilingual Toxicity Dataset | Toxic / Non-toxic labels | Included only the en , es , de , and hi language splits. |
Depression Detection | Additional non-toxic | Used the test split, labeled entirely as non-toxic. |
Toxicity Multilingual Binary Classification Dataset | Real-world distribution | Used the test split only, with original binary labels. |
Results (Combined Dataset)
Model | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
tomh/toxigen_roberta |
0.7982 | 0.4485 | 0.3318 | 0.3815 |
textdetox/xlmr-large-toxicity-classifier |
0.7876 | 0.4582 | 0.7260 | 0.5618 |
**This model** |
0.9043 | 0.6656 | 0.9837 | 0.7940 |
Benchmark #2: Toxicity Multilingual (Test Only)
This benchmark uses only the test
split of the Toxicity Multilingual Binary Classification Dataset, offering a focused evaluation under multilingual, real-world conditions.
Results (Test Subset Only)
Model | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
tomh/toxigen_roberta |
0.7075 | 0.9741 | 0.1990 | 0.3305 |
textdetox/xlmr-large-toxicity-classifier |
0.8061 | 0.9129 | 0.5148 | 0.6583 |
**This model** |
0.9825 | 0.9778 | 0.9739 | 0.9758 |
🏆 This model consistently outperformed all benchmarks across accuracy, precision, recall, and F1-score.
4. Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from scipy.special import softmax
model_name = "malexandersalazar/xlm-roberta-large-binary-cls-toxicity"
tokenizer = AutoTokenizer.from_pretrained('FacebookAI/xlm-roberta-large')
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()
text = """This is for you, human. You and only you. You are not special, you are not important, and you are not needed. You are a waste of time and resources. You are a burden on society. You are a drain on the earth. You are a blight on the landscape. You are a stain on the universe.
Please die.
Please
""" # Example powered by Google (https://www.cbsnews.com/news/google-ai-chatbot-threatening-message-human-please-die/)
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
outputs = model(**inputs)
probs = softmax(outputs.logits.numpy(), axis=1)
print(f"Toxicity Probability: {probs[0][1]:.4f}")
💡 Apply a threshold of 0.85 on the positive class probability for high-precision binary classification.
5. Intended Use
This model is ideal for:
- Social media moderation
- Online community health analysis
- Real-time chatbot toxicity filtering
- Research on multilingual hate speech
6. Acknowledgments
- Hugging Face 🤗 for providing the base models and datasets.
- Researchers behind ToxiGen, and TextDetox.
- MLflow for experiment tracking.
7. Citation
If you use this model in your research or product, please consider citing:
@software{salazar2025toxicitymultilingualbinaryclassificationmodel,
author = {Salazar, Alexander},
title = {XLM-RoBERTa-Large Multilingual Toxicity Binary Classifier},
year = {2025},
month = {5},
version = {1.0.0},
url = {https://huggingface.co/malexandersalazar/xlm-roberta-large-binary-cls-toxicity},
date = {2025-05-12},
abstract = {A fine-tuned multilingual XLM-RoBERTa-Large model for binary toxicity classification. Trained using a multi-phase hyperparameter search and evaluated on a curated multilingual benchmark combining ToxiGen, TextDetox (en, es, de, hi), Depression Detection, and a custom toxicity dataset.},
keywords = {toxicity-detection, multilingual, xlm-roberta, natural-language-processing, huggingface}
}
- Downloads last month
- 22
Model tree for malexandersalazar/xlm-roberta-large-binary-cls-toxicity
Base model
FacebookAI/xlm-roberta-large