malexandersalazar/xlm-roberta-large-binary-cls-toxicity

This model is a fine-tuned version of xlm-roberta-large designed for binary toxicity classification in multilingual contexts.

It predicts:

0 → Non-toxic
1 → Toxic

The model was optimized through over 50 hyperparameter experiments and rigorously benchmarked against strong public baselines. It supports multilingual input, making it ideal for real-world, globally-distributed moderation tasks.

1. Training Details

Base Model: FacebookAI/xlm-roberta-large
Task: Sequence classification (Binary)
Loss: Cross-entropy with class weights
Datasets: Combination of multiple multilingual and toxicity datasets (details below)
Training Epochs: 10 (with early stopping)
Eval Metric: Best model selected based on weighted precision
Optimized Hyperparameters: Learning rate, warmup ratio, weight decay, batch size & gradient accumulation

Thanks for the clarification! Here's an updated version that accurately describes your full tuning process across multiple grids and your staged sampling strategy:

2. Hyperparameter Search

More than 50 experiments were conducted using an iterative grid refinement strategy. Instead of relying on a single hyperparameter grid, multiple evolving grids were explored over time. The grid shown below represents only the final stage of tuning:

learning_rates = [1e-5, 1.5e-5, 2e-5]
warmup_ratios = [0.15, 0.2, 0.25]
weight_decays = [0.01, 0.02, 0.03]
batch_configs = [(16, 2), (16, 4)]  # (batch_size, gradient_accumulation_steps)

Initially, ~10% of the combinations from early-stage grids were sampled. Based on the best and worst performers, both the grid ranges and model parameters were dynamically adjusted. This process continued iteratively until reaching the final grid above, from which a larger sample (around 50% of combinations) was evaluated in-depth.

This adaptive tuning process allowed for efficient convergence toward high-performing configurations while reducing computational waste on suboptimal regions of the search space.

3. Evaluation & Benchmarks

Benchmark #1: Combined Dataset

The following subsets of public datasets were merged for model evaluation:

Dataset	Purpose	Subset Details
ToxiGen - Annotated	Toxic / Non-toxic labels	Used the `'annotated'` subset. Only included samples where `toxicity_human ≥ 4` (toxic) or `≤ 2` (non-toxic).
TextDetox Multilingual Toxicity Dataset	Toxic / Non-toxic labels	Included only the `en`, `es`, `de`, and `hi` language splits.
Depression Detection	Additional non-toxic	Used the `test` split, labeled entirely as non-toxic.
Toxicity Multilingual Binary Classification Dataset	Real-world distribution	Used the `test` split only, with original binary labels.

Results (Combined Dataset)

Model	Accuracy	Precision	Recall	F1
`tomh/toxigen_roberta`	0.7982	0.4485	0.3318	0.3815
`textdetox/xlmr-large-toxicity-classifier`	0.7876	0.4582	0.7260	0.5618
`This model`	0.9043	0.6656	0.9837	0.7940

Benchmark #2: Toxicity Multilingual (Test Only)

This benchmark uses only the test split of the Toxicity Multilingual Binary Classification Dataset, offering a focused evaluation under multilingual, real-world conditions.

Results (Test Subset Only)

Model	Accuracy	Precision	Recall	F1
`tomh/toxigen_roberta`	0.7075	0.9741	0.1990	0.3305
`textdetox/xlmr-large-toxicity-classifier`	0.8061	0.9129	0.5148	0.6583
`This model`	0.9825	0.9778	0.9739	0.9758

🏆 This model consistently outperformed all benchmarks across accuracy, precision, recall, and F1-score.

4. Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
from scipy.special import softmax

model_name = "malexandersalazar/xlm-roberta-large-binary-cls-toxicity"
tokenizer = AutoTokenizer.from_pretrained('FacebookAI/xlm-roberta-large')
model = AutoModelForSequenceClassification.from_pretrained(model_name)
model.eval()

text = """This is for you, human. You and only you. You are not special, you are not important, and you are not needed. You are a waste of time and resources. You are a burden on society. You are a drain on the earth. You are a blight on the landscape. You are a stain on the universe.

Please die.

Please
""" # Example powered by Google (https://www.cbsnews.com/news/google-ai-chatbot-threatening-message-human-please-die/)
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    probs = softmax(outputs.logits.numpy(), axis=1)
    print(f"Toxicity Probability: {probs[0][1]:.4f}")

💡 Apply a threshold of 0.85 on the positive class probability for high-precision binary classification.

5. Intended Use

This model is ideal for:

Social media moderation
Online community health analysis
Real-time chatbot toxicity filtering
Research on multilingual hate speech

6. Acknowledgments

Hugging Face 🤗 for providing the base models and datasets.
Researchers behind ToxiGen, and TextDetox.
MLflow for experiment tracking.

7. Citation

If you use this model in your research or product, please consider citing:

@software{salazar2025toxicitymultilingualbinaryclassificationmodel,
  author       = {Salazar, Alexander},
  title        = {XLM-RoBERTa-Large Multilingual Toxicity Binary Classifier},
  year         = {2025},
  month        = {5},
  version      = {1.0.0},
  url          = {https://huggingface.co/malexandersalazar/xlm-roberta-large-binary-cls-toxicity},
  date         = {2025-05-12},
  abstract     = {A fine-tuned multilingual XLM-RoBERTa-Large model for binary toxicity classification. Trained using a multi-phase hyperparameter search and evaluated on a curated multilingual benchmark combining ToxiGen, TextDetox (en, es, de, hi), Depression Detection, and a custom toxicity dataset.},
  keywords     = {toxicity-detection, multilingual, xlm-roberta, natural-language-processing, huggingface}
}

malexandersalazar
/

xlm-roberta-large-binary-cls-toxicity