metadata
language:
- en
tags:
- fact-checking
- misinformation-detection
- bert
- modernbert
datasets:
- FELM
- FEVER
- HaluEval
- LIAR
metrics:
- accuracy
- f1
ModernBERT Fact-Checking Model
Model Description
This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).
Base Model: answerdotai/ModernBERT-base
Intended Uses
Primary Use
- Automated fact-checking systems
- Misinformation detection pipelines
- Content moderation tools
Out-of-Scope Uses
- Multilingual fact-checking (English only)
- Medical/legal claim verification
- Highly domain-specific claims
How to use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")
inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)
Training Data
The model was trained on a combination of four datasets:
Dataset | Samples | Domain |
---|---|---|
FELM | 34,000 | General claims |
FEVER | 145,000 | Wikipedia-based claims |
HaluEval | 12,000 | QA hallucination detection |
LIAR | 12,800 | Political claims |
Total training samples: ~203,800
Training Procedure
Hyperparameters
- Learning Rate: 5e-5
- Batch Size: 32
- Epochs: 1
- Max Sequence Length: 512 tokens
- Optimizer: adamw_torch_fused
Preprocessing
All datasets were converted to a standardized format:
{
"text": "full claim text",
"label": 0.0 or 1.0,
"source": "dataset_name"
}