metadata

language:
  - en
tags:
  - fact-checking
  - misinformation-detection
  - bert
  - modernbert
datasets:
  - FELM
  - FEVER
  - HaluEval
  - LIAR
metrics:
  - accuracy
  - f1

ModernBERT Fact-Checking Model

Model Description

This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).

Base Model: answerdotai/ModernBERT-base

Intended Uses

Primary Use

Automated fact-checking systems
Misinformation detection pipelines
Content moderation tools

Out-of-Scope Uses

Multilingual fact-checking (English only)
Medical/legal claim verification
Highly domain-specific claims

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")

inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)

Training Data

The model was trained on a combination of four datasets:

Dataset	Samples	Domain
FELM	34,000	General claims
FEVER	145,000	Wikipedia-based claims
HaluEval	12,000	QA hallucination detection
LIAR	12,800	Political claims

Total training samples: ~203,800

Training Procedure

Hyperparameters

Learning Rate: 5e-5
Batch Size: 32
Epochs: 1
Max Sequence Length: 512 tokens
Optimizer: adamw_torch_fused

Preprocessing

All datasets were converted to a standardized format:

{
  "text": "full claim text",
  "label": 0.0 or 1.0,
  "source": "dataset_name"
}