ModernBERT Fact-Checking Model

Model Description

This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).

Base Model: answerdotai/ModernBERT-base

Intended Uses

Primary Use

  • Automated fact-checking systems
  • Misinformation detection pipelines
  • Content moderation tools

Out-of-Scope Uses

  • Multilingual fact-checking (English only)
  • Medical/legal claim verification
  • Highly domain-specific claims

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")

inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)

Training Data

The model was trained on a combination of four datasets:

Dataset Samples Domain
FELM 34,000 General claims
FEVER 145,000 Wikipedia-based claims
HaluEval 12,000 QA hallucination detection
LIAR 12,800 Political claims

Total training samples: ~203,800

Training Procedure

Hyperparameters

  • Learning Rate: 5e-5
  • Batch Size: 32
  • Epochs: 1
  • Max Sequence Length: 512 tokens
  • Optimizer: adamw_torch_fused

Preprocessing

All datasets were converted to a standardized format:

{
  "text": "full claim text",
  "label": 0.0 or 1.0,
  "source": "dataset_name"
}
Downloads last month
3
Safetensors
Model size
150M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support