XLM-RoBERTa Toxicity Classifier

This model is a fine-tuned version of FacebookAI/xlm-roberta-base for multi-label toxicity classification.

Model Description

This model can classify text into the following toxicity categories:

Toxic
Severe Toxic
Obscene
Threat
Insult
Identity Hate
None (for non-toxic content)

Usage

from transformers import XLMRobertaForSequenceClassification, XLMRobertaTokenizer
import torch

# Load model and tokenizer
model = XLMRobertaForSequenceClassification.from_pretrained("oleksiizirka/xlm-roberta-toxicity-classifier")
tokenizer = XLMRobertaTokenizer.from_pretrained("oleksiizirka/xlm-roberta-toxicity-classifier")

# Prepare input
text = "Your text here"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)

# Get predictions
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Print results
labels = ['toxic', 'severe_toxic', 'obscene', 'threat', 'insult', 'identity_hate', 'none']
for label, score in zip(labels, predictions[0]):
    if score > 0.5:
        print(f"{label}: {score:.3f}")

Training Data

The model was trained on the Jigsaw Toxic Comment Classification dataset.

Training Procedure

Base model: FacebookAI/xlm-roberta-base
Training approach: Multi-label classification with BCEWithLogitsLoss
Optimization: AdamW with learning rate 2e-5
Batch size: 16
Epochs: 3-5 with early stopping

Limitations

Trained primarily on English text
May exhibit biases present in the training data
Should be used as part of a larger content moderation system