ModernBERT Fact-Checking Model

Model Description

This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).

Base Model: answerdotai/ModernBERT-base

Intended Uses

Primary Use

Automated fact-checking systems
Misinformation detection pipelines
Content moderation tools

Out-of-Scope Uses

Multilingual fact-checking (English only)
Medical/legal claim verification
Highly domain-specific claims

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")

inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)

Training Data

The model was trained on a combination of four datasets:

Dataset	Samples	Domain
FELM	34,000	General claims
FEVER	145,000	Wikipedia-based claims
HaluEval	12,000	QA hallucination detection
LIAR	12,800	Political claims

Total training samples: ~203,800

Training Procedure

Hyperparameters

Learning Rate: 5e-5
Batch Size: 32
Epochs: 1
Max Sequence Length: 512 tokens
Optimizer: adamw_torch_fused

Preprocessing

All datasets were converted to a standardized format:

{
  "text": "full claim text",
  "label": 0.0 or 1.0,
  "source": "dataset_name"
}

Downloads last month: 4

Safetensors

Model size

150M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support