🏆 Upload elite fake news model - 99.98% accuracy!
Browse files- README.md +135 -3
- added_tokens.json +3 -0
- config.json +45 -0
- model.safetensors +3 -0
- special_tokens_map.json +15 -0
- spm.model +3 -0
- tokenizer.json +0 -0
- tokenizer_config.json +59 -0
- training_args.bin +3 -0
- training_summary.json +44 -0
README.md
CHANGED
@@ -1,3 +1,135 @@
|
|
1 |
-
---
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language: en
|
3 |
+
license: mit
|
4 |
+
tags:
|
5 |
+
- fake-news-detection
|
6 |
+
- deberta-v3-large
|
7 |
+
- text-classification
|
8 |
+
- binary-classification
|
9 |
+
- news-classification
|
10 |
+
datasets:
|
11 |
+
- mrisdal/fake-news
|
12 |
+
- jainpooja/fake-news-detection
|
13 |
+
- clmentbisaillon/fake-and-real-news-dataset
|
14 |
+
metrics:
|
15 |
+
- accuracy
|
16 |
+
- f1
|
17 |
+
- precision
|
18 |
+
- recall
|
19 |
+
widget:
|
20 |
+
- text: "Scientists announce breakthrough discovery of alien life on Mars!"
|
21 |
+
example_title: "Suspicious Claim"
|
22 |
+
- text: "The Federal Reserve announced a 0.25% interest rate increase following their monthly meeting."
|
23 |
+
example_title: "Financial News"
|
24 |
+
model-index:
|
25 |
+
- name: Arko007/fact-check1-v1
|
26 |
+
results:
|
27 |
+
- task:
|
28 |
+
type: text-classification
|
29 |
+
name: Fake News Detection
|
30 |
+
metrics:
|
31 |
+
- type: accuracy
|
32 |
+
value: 99.98
|
33 |
+
name: Validation Accuracy
|
34 |
+
- type: f1
|
35 |
+
value: 99.98
|
36 |
+
name: Validation F1-Score
|
37 |
+
---
|
38 |
+
# 🏆 Elite Fake News Detection Model
|
39 |
+
|
40 |
+
## Model Description
|
41 |
+
This is a **state-of-the-art** fake news detection model based on **DeBERTa-v3-large**, achieving **99.98% accuracy** on validation data. The model was fine-tuned on a carefully curated and deduplicated dataset combining multiple high-quality fake news datasets, totaling **51,319 samples** after preprocessing.
|
42 |
+
|
43 |
+
## 🚀 Performance Highlights
|
44 |
+
- **Validation Accuracy**: 99.98%
|
45 |
+
- **Test Accuracy**: 99.94%
|
46 |
+
- **F1-Score**: 99.98%
|
47 |
+
- **Precision**: 99.97%
|
48 |
+
- **Recall**: 100.00%
|
49 |
+
|
50 |
+
## Model Architecture
|
51 |
+
- **Base Model**: microsoft/deberta-v3-large
|
52 |
+
- **Task**: Binary Text Classification (Real vs Fake News)
|
53 |
+
- **Parameters**: ~400M parameters
|
54 |
+
- **Training Hardware**: NVIDIA A100-SXM4-80GB
|
55 |
+
|
56 |
+
## Training Details
|
57 |
+
- **Training Steps**: 640
|
58 |
+
- **Batch Size**: 64
|
59 |
+
- **Learning Rate**: 3e-05
|
60 |
+
- **Max Length**: 512 tokens
|
61 |
+
- **Training Time**: 0.43 hours
|
62 |
+
- **Gradient Checkpointing**: Non-reentrant (memory optimized)
|
63 |
+
|
64 |
+
## Dataset Information
|
65 |
+
**Total Samples**: 51,319
|
66 |
+
- **Training**: 41,055 samples
|
67 |
+
- **Validation**: 5,132 samples
|
68 |
+
- **Test**: 5,132 samples
|
69 |
+
- **Fake News**: 30,123 samples
|
70 |
+
- **Real News**: 21,196 samples
|
71 |
+
**Source Datasets**:
|
72 |
+
- `mrisdal/fake-news`
|
73 |
+
- `jainpooja/fake-news-detection`
|
74 |
+
- `clmentbisaillon/fake-and-real-news-dataset`
|
75 |
+
|
76 |
+
## Usage
|
77 |
+
```python
|
78 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
79 |
+
import torch
|
80 |
+
|
81 |
+
# Load model and tokenizer
|
82 |
+
model_name = "Arko007/fact-check1-v1"
|
83 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
84 |
+
model = AutoModelForSequenceClassification.from_pretrained(model_name)
|
85 |
+
|
86 |
+
# Example prediction function
|
87 |
+
def predict_fake_news(text):
|
88 |
+
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
|
89 |
+
with torch.no_grad():
|
90 |
+
outputs = model(**inputs)
|
91 |
+
probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
|
92 |
+
prediction = torch.argmax(probabilities, dim=-1).item()
|
93 |
+
|
94 |
+
labels = {0: "REAL", 1: "FAKE"}
|
95 |
+
confidence = probabilities[0][prediction].item()
|
96 |
+
|
97 |
+
return {
|
98 |
+
"prediction": labels[prediction],
|
99 |
+
"confidence": confidence,
|
100 |
+
"probabilities": {
|
101 |
+
"REAL": probabilities[0][0].item(),
|
102 |
+
"FAKE": probabilities[0][1].item()
|
103 |
+
}
|
104 |
+
}
|
105 |
+
|
106 |
+
# Test the model
|
107 |
+
text = "Breaking: Scientists discover new planet in our solar system!"
|
108 |
+
result = predict_fake_news(text)
|
109 |
+
print(f"Prediction: {result['prediction']} ({result['confidence']:.2%} confidence)")
|
110 |
+
```
|
111 |
+
## Model Performance
|
112 |
+
|
113 |
+
This model achieves **research-grade performance** on fake news detection, with near-perfect accuracy across all metrics. The high precision and recall indicate excellent balance between catching fake news while avoiding false positives on real news.
|
114 |
+
|
115 |
+
## Limitations and Bias
|
116 |
+
|
117 |
+
- Trained primarily on English news articles
|
118 |
+
- Performance may vary on news domains not represented in training data
|
119 |
+
- May reflect biases present in the source datasets
|
120 |
+
- Designed for binary classification (fake vs real) only
|
121 |
+
|
122 |
+
## Citation
|
123 |
+
```bibtex
|
124 |
+
@misc{fake-news-deberta-2025,
|
125 |
+
author = {Arko007},
|
126 |
+
title = {Elite Fake News Detection with DeBERTa-v3-Large},
|
127 |
+
year = {2025},
|
128 |
+
publisher = {Hugging Face},
|
129 |
+
url = {[https://huggingface.co/](https://huggingface.co/)Arko007/fact-check1-v1}
|
130 |
+
}
|
131 |
+
```
|
132 |
+
## License
|
133 |
+
MIT License - Feel free to use this model for research and applications.
|
134 |
+
---
|
135 |
+
**Built with ❤️ using A100 80GB + DeBERTa-v3-Large**
|
added_tokens.json
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"[MASK]": 128000
|
3 |
+
}
|
config.json
ADDED
@@ -0,0 +1,45 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"architectures": [
|
3 |
+
"DebertaV2ForSequenceClassification"
|
4 |
+
],
|
5 |
+
"attention_probs_dropout_prob": 0.1,
|
6 |
+
"bos_token_id": 1,
|
7 |
+
"dtype": "bfloat16",
|
8 |
+
"eos_token_id": 2,
|
9 |
+
"hidden_act": "gelu",
|
10 |
+
"hidden_dropout_prob": 0.1,
|
11 |
+
"hidden_size": 1024,
|
12 |
+
"id2label": {
|
13 |
+
"0": "REAL",
|
14 |
+
"1": "FAKE"
|
15 |
+
},
|
16 |
+
"initializer_range": 0.02,
|
17 |
+
"intermediate_size": 4096,
|
18 |
+
"label2id": {
|
19 |
+
"FAKE": 1,
|
20 |
+
"REAL": 0
|
21 |
+
},
|
22 |
+
"layer_norm_eps": 1e-07,
|
23 |
+
"legacy": true,
|
24 |
+
"max_position_embeddings": 512,
|
25 |
+
"max_relative_positions": -1,
|
26 |
+
"model_type": "deberta-v2",
|
27 |
+
"norm_rel_ebd": "layer_norm",
|
28 |
+
"num_attention_heads": 16,
|
29 |
+
"num_hidden_layers": 24,
|
30 |
+
"pad_token_id": 0,
|
31 |
+
"pooler_dropout": 0,
|
32 |
+
"pooler_hidden_act": "gelu",
|
33 |
+
"pooler_hidden_size": 1024,
|
34 |
+
"pos_att_type": [
|
35 |
+
"p2c",
|
36 |
+
"c2p"
|
37 |
+
],
|
38 |
+
"position_biased_input": false,
|
39 |
+
"position_buckets": 256,
|
40 |
+
"relative_attention": true,
|
41 |
+
"share_att_key": true,
|
42 |
+
"transformers_version": "4.56.1",
|
43 |
+
"type_vocab_size": 0,
|
44 |
+
"vocab_size": 128100
|
45 |
+
}
|
model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:f279ccdbc7c41c13c5f725bc47b013dea682c4e85991ce12a2d79ebf204a5fba
|
3 |
+
size 870176748
|
special_tokens_map.json
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": "[CLS]",
|
3 |
+
"cls_token": "[CLS]",
|
4 |
+
"eos_token": "[SEP]",
|
5 |
+
"mask_token": "[MASK]",
|
6 |
+
"pad_token": "[PAD]",
|
7 |
+
"sep_token": "[SEP]",
|
8 |
+
"unk_token": {
|
9 |
+
"content": "[UNK]",
|
10 |
+
"lstrip": false,
|
11 |
+
"normalized": true,
|
12 |
+
"rstrip": false,
|
13 |
+
"single_word": false
|
14 |
+
}
|
15 |
+
}
|
spm.model
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
|
3 |
+
size 2464616
|
tokenizer.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|
tokenizer_config.json
ADDED
@@ -0,0 +1,59 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"added_tokens_decoder": {
|
3 |
+
"0": {
|
4 |
+
"content": "[PAD]",
|
5 |
+
"lstrip": false,
|
6 |
+
"normalized": false,
|
7 |
+
"rstrip": false,
|
8 |
+
"single_word": false,
|
9 |
+
"special": true
|
10 |
+
},
|
11 |
+
"1": {
|
12 |
+
"content": "[CLS]",
|
13 |
+
"lstrip": false,
|
14 |
+
"normalized": false,
|
15 |
+
"rstrip": false,
|
16 |
+
"single_word": false,
|
17 |
+
"special": true
|
18 |
+
},
|
19 |
+
"2": {
|
20 |
+
"content": "[SEP]",
|
21 |
+
"lstrip": false,
|
22 |
+
"normalized": false,
|
23 |
+
"rstrip": false,
|
24 |
+
"single_word": false,
|
25 |
+
"special": true
|
26 |
+
},
|
27 |
+
"3": {
|
28 |
+
"content": "[UNK]",
|
29 |
+
"lstrip": false,
|
30 |
+
"normalized": true,
|
31 |
+
"rstrip": false,
|
32 |
+
"single_word": false,
|
33 |
+
"special": true
|
34 |
+
},
|
35 |
+
"128000": {
|
36 |
+
"content": "[MASK]",
|
37 |
+
"lstrip": false,
|
38 |
+
"normalized": false,
|
39 |
+
"rstrip": false,
|
40 |
+
"single_word": false,
|
41 |
+
"special": true
|
42 |
+
}
|
43 |
+
},
|
44 |
+
"bos_token": "[CLS]",
|
45 |
+
"clean_up_tokenization_spaces": false,
|
46 |
+
"cls_token": "[CLS]",
|
47 |
+
"do_lower_case": false,
|
48 |
+
"eos_token": "[SEP]",
|
49 |
+
"extra_special_tokens": {},
|
50 |
+
"mask_token": "[MASK]",
|
51 |
+
"model_max_length": 1000000000000000019884624838656,
|
52 |
+
"pad_token": "[PAD]",
|
53 |
+
"sep_token": "[SEP]",
|
54 |
+
"sp_model_kwargs": {},
|
55 |
+
"split_by_punct": false,
|
56 |
+
"tokenizer_class": "DebertaV2Tokenizer",
|
57 |
+
"unk_token": "[UNK]",
|
58 |
+
"vocab_type": "spm"
|
59 |
+
}
|
training_args.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:300ec60d973fb27d6b612f762ce97a8e6c6122ae69a54b62e74235626f8a2dc4
|
3 |
+
size 5841
|
training_summary.json
ADDED
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_checkpoint": "microsoft/deberta-v3-large",
|
3 |
+
"final_metrics": {
|
4 |
+
"validation": {
|
5 |
+
"eval_loss": 0.0013762748567387462,
|
6 |
+
"eval_accuracy": 0.9998051441932969,
|
7 |
+
"eval_precision": 0.9996682149966821,
|
8 |
+
"eval_recall": 1.0,
|
9 |
+
"eval_f1": 0.9998340799734527,
|
10 |
+
"eval_runtime": 34.6469,
|
11 |
+
"eval_samples_per_second": 148.123,
|
12 |
+
"eval_steps_per_second": 1.183,
|
13 |
+
"epoch": 0.9968847352024922
|
14 |
+
},
|
15 |
+
"test": {
|
16 |
+
"eval_loss": 0.0015322713879868388,
|
17 |
+
"eval_accuracy": 0.9994154325798909,
|
18 |
+
"eval_precision": 0.9993362097577165,
|
19 |
+
"eval_recall": 0.999667994687915,
|
20 |
+
"eval_f1": 0.9995020746887967,
|
21 |
+
"eval_runtime": 34.9714,
|
22 |
+
"eval_samples_per_second": 146.748,
|
23 |
+
"eval_steps_per_second": 1.172,
|
24 |
+
"epoch": 0.9968847352024922
|
25 |
+
}
|
26 |
+
},
|
27 |
+
"training_config": {
|
28 |
+
"max_steps": 640,
|
29 |
+
"batch_size": 64,
|
30 |
+
"gradient_accumulation_steps": 2,
|
31 |
+
"learning_rate": 3e-05,
|
32 |
+
"max_length": 512,
|
33 |
+
"gradient_checkpointing": "non-reentrant"
|
34 |
+
},
|
35 |
+
"dataset_stats": {
|
36 |
+
"total_examples": 51319,
|
37 |
+
"train_size": 41055,
|
38 |
+
"val_size": 5132,
|
39 |
+
"test_size": 5132,
|
40 |
+
"fake_samples": 30123,
|
41 |
+
"real_samples": 21196
|
42 |
+
},
|
43 |
+
"runtime_hours": 0.4279349425766203
|
44 |
+
}
|