--- language: - en tags: - fact-checking - misinformation-detection - bert - modernbert datasets: - FELM - FEVER - HaluEval - LIAR metrics: - accuracy - f1 --- # ModernBERT Fact-Checking Model ## Model Description This is a fine-tuned ModernBERT model for binary fact-checking classification, trained on consolidated datasets from multiple authoritative sources. The model determines whether a given claim is likely to be true (label 1) or false (label 0). **Base Model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) ## Intended Uses ### Primary Use - Automated fact-checking systems - Misinformation detection pipelines - Content moderation tools ### Out-of-Scope Uses - Multilingual fact-checking (English only) - Medical/legal claim verification - Highly domain-specific claims ### How to use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("your-username/modernbert-factchecking") model = AutoModelForSequenceClassification.from_pretrained("your-username/modernbert-factchecking") inputs = tokenizer("Your claim to verify here", return_tensors="pt", truncation=True, max_length=512) outputs = model(**inputs) predictions = torch.softmax(outputs.logits, dim=-1) ``` ## Training Data The model was trained on a combination of four datasets: | Dataset | Samples | Domain | |---------|---------|--------| | FELM | 34,000 | General claims | | FEVER | 145,000 | Wikipedia-based claims | | HaluEval | 12,000 | QA hallucination detection | | LIAR | 12,800 | Political claims | **Total training samples:** ~203,800 ## Training Procedure ### Hyperparameters - Learning Rate: 5e-5 - Batch Size: 32 - Epochs: 1 - Max Sequence Length: 512 tokens - Optimizer: adamw_torch_fused ### Preprocessing All datasets were converted to a standardized format: ```python { "text": "full claim text", "label": 0.0 or 1.0, "source": "dataset_name" }