Use Model

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
identity_model = AutoModelForSequenceClassification.from_pretrained("Mridul2003/identity-hate-detector").to(device)
identity_tokenizer = AutoTokenizer.from_pretrained("Mridul2003/identity-hate-detector")
identity_inputs = identity_tokenizer(final_text, return_tensors="pt", padding=True, truncation=True)
    if 'token_type_ids' in identity_inputs:
        del identity_inputs['token_type_ids']
    identity_inputs = {k: v.to(device) for k, v in identity_inputs.items()}
    with torch.no_grad():
        identity_outputs = identity_model(**identity_inputs)
    identity_probs = torch.sigmoid(identity_outputs.logits)
    identity_prob = identity_probs[0][1].item()
    not_identity_prob = identity_probs[0][0].item()

    results["identity_hate_custom"] = identity_prob
    results["not_identity_hate_custom"] = not_identity_prob

Offensive Language Classifier (Fine-Tuned on Custom Dataset)

This repository contains a fine-tuned version of the unitary/toxic-bert model for binary classification of offensive language (labels: Offensive vs Not Offensive). The model has been specifically fine-tuned on a custom dataset due to limitations observed in the base model's performance — particularly with identity_hate related content.

🔍 Problem with Base Model (`unitary/toxic-bert`)

The original unitary/toxic-bert model is trained for multi-label toxicity detection with 6 categories:

toxic
severe toxic
obscene
threat
insult
identity_hate

While it performs reasonably well on generic toxicity, it struggles with edge cases involving identity-based hate speech — often:

Misclassifying subtle or sarcastic identity attacks
Underestimating offensive content with identity-specific slurs

✅ Why Fine-Tune?

We fine-tuned the model on a custom annotated dataset with two clear labels:

0: Not Identity Hate
1: Identity Hate

The new model simplifies the task into a binary classification problem, allowing more focused training for real-world moderation scenarios.

📊 Dataset Overview

Total examples: ~4,000+
Balanced between offensive and non-offensive labels
Contains high proportions of identity_hate, obscene, insult, and more nuanced samples

🧠 Model Details

Base model: unitary/toxic-bert
Fine-tuned using: Hugging Face 🤗 Trainer API
Loss function: CrossEntropyLoss (via num_labels=2)
Batch size: 8
Epochs: 3
Learning rate: 2e-5

🔬 Performance (Binary Classification)

Metric	Value
Accuracy	~92%
Precision / Recall	Balanced

Downloads last month: 2

Safetensors

Model size

109M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mridul2003/identity-hate-detector

Base model

unitary/toxic-bert

Finetuned

(7)

this model

Mridul2003
/

identity-hate-detector

Offensive Language Classifier (Fine-Tuned on Custom Dataset)

🔍 Problem with Base Model (`unitary/toxic-bert`)

✅ Why Fine-Tune?

📊 Dataset Overview

🧠 Model Details

🔬 Performance (Binary Classification)

Model tree for Mridul2003/identity-hate-detector

Space using Mridul2003/identity-hate-detector 1

Offensive Language Classifier (Fine-Tuned on Custom Dataset)

🔍 Problem with Base Model (unitary/toxic-bert)

✅ Why Fine-Tune?

📊 Dataset Overview

🧠 Model Details

🔬 Performance (Binary Classification)

Model tree for Mridul2003/identity-hate-detector

Space using Mridul2003/identity-hate-detector 1

🔍 Problem with Base Model (`unitary/toxic-bert`)