insuperabile's picture
Update README.md
0528e7c verified
metadata
language:
  - en
tags:
  - fact-checking
  - misinformation-detection
  - bert
  - modernbert
datasets:
  - FELM
  - FEVER
  - HaluEval
  - LIAR
metrics:
  - accuracy
  - f1

ModernBERT Fact-Checking Model

Model Description

This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0).

Base Model: answerdotai/ModernBERT-base

Intended Uses

Primary Use

  • Automated fact-checking systems
  • Misinformation detection pipelines
  • Content moderation tools

Out-of-Scope Uses

  • Multilingual fact-checking (English only)
  • Medical/legal claim verification
  • Highly domain-specific claims

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking")
model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking")

inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
predictions = torch.softmax(outputs.logits, dim=-1)

Training Data

The model was trained on a combination of four datasets:

Dataset Samples Domain
FELM 34,000 General claims
FEVER 145,000 Wikipedia-based claims
HaluEval 12,000 QA hallucination detection
LIAR 12,800 Political claims

Total training samples: ~203,800

Training Procedure

Hyperparameters

  • Learning Rate: 5e-5
  • Batch Size: 32
  • Epochs: 1
  • Max Sequence Length: 512 tokens
  • Optimizer: adamw_torch_fused

Preprocessing

All datasets were converted to a standardized format:

{
  "text": "full claim text",
  "label": 0.0 or 1.0,
  "source": "dataset_name"
}