Arko007 commited on
Commit
a859d88
·
verified ·
1 Parent(s): f75035d

🏆 Upload elite fake news model - 99.98% accuracy!

Browse files
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: en
3
+ license: mit
4
+ tags:
5
+ - fake-news-detection
6
+ - deberta-v3-large
7
+ - text-classification
8
+ - binary-classification
9
+ - news-classification
10
+ datasets:
11
+ - mrisdal/fake-news
12
+ - jainpooja/fake-news-detection
13
+ - clmentbisaillon/fake-and-real-news-dataset
14
+ metrics:
15
+ - accuracy
16
+ - f1
17
+ - precision
18
+ - recall
19
+ widget:
20
+ - text: "Scientists announce breakthrough discovery of alien life on Mars!"
21
+ example_title: "Suspicious Claim"
22
+ - text: "The Federal Reserve announced a 0.25% interest rate increase following their monthly meeting."
23
+ example_title: "Financial News"
24
+ model-index:
25
+ - name: Arko007/fact-check1-v1
26
+ results:
27
+ - task:
28
+ type: text-classification
29
+ name: Fake News Detection
30
+ metrics:
31
+ - type: accuracy
32
+ value: 99.98
33
+ name: Validation Accuracy
34
+ - type: f1
35
+ value: 99.98
36
+ name: Validation F1-Score
37
+ ---
38
+ # 🏆 Elite Fake News Detection Model
39
+
40
+ ## Model Description
41
+ This is a **state-of-the-art** fake news detection model based on **DeBERTa-v3-large**, achieving **99.98% accuracy** on validation data. The model was fine-tuned on a carefully curated and deduplicated dataset combining multiple high-quality fake news datasets, totaling **51,319 samples** after preprocessing.
42
+
43
+ ## 🚀 Performance Highlights
44
+ - **Validation Accuracy**: 99.98%
45
+ - **Test Accuracy**: 99.94%
46
+ - **F1-Score**: 99.98%
47
+ - **Precision**: 99.97%
48
+ - **Recall**: 100.00%
49
+
50
+ ## Model Architecture
51
+ - **Base Model**: microsoft/deberta-v3-large
52
+ - **Task**: Binary Text Classification (Real vs Fake News)
53
+ - **Parameters**: ~400M parameters
54
+ - **Training Hardware**: NVIDIA A100-SXM4-80GB
55
+
56
+ ## Training Details
57
+ - **Training Steps**: 640
58
+ - **Batch Size**: 64
59
+ - **Learning Rate**: 3e-05
60
+ - **Max Length**: 512 tokens
61
+ - **Training Time**: 0.43 hours
62
+ - **Gradient Checkpointing**: Non-reentrant (memory optimized)
63
+
64
+ ## Dataset Information
65
+ **Total Samples**: 51,319
66
+ - **Training**: 41,055 samples
67
+ - **Validation**: 5,132 samples
68
+ - **Test**: 5,132 samples
69
+ - **Fake News**: 30,123 samples
70
+ - **Real News**: 21,196 samples
71
+ **Source Datasets**:
72
+ - `mrisdal/fake-news`
73
+ - `jainpooja/fake-news-detection`
74
+ - `clmentbisaillon/fake-and-real-news-dataset`
75
+
76
+ ## Usage
77
+ ```python
78
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
79
+ import torch
80
+
81
+ # Load model and tokenizer
82
+ model_name = "Arko007/fact-check1-v1"
83
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
84
+ model = AutoModelForSequenceClassification.from_pretrained(model_name)
85
+
86
+ # Example prediction function
87
+ def predict_fake_news(text):
88
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
89
+ with torch.no_grad():
90
+ outputs = model(**inputs)
91
+ probabilities = torch.nn.functional.softmax(outputs.logits, dim=-1)
92
+ prediction = torch.argmax(probabilities, dim=-1).item()
93
+
94
+ labels = {0: "REAL", 1: "FAKE"}
95
+ confidence = probabilities[0][prediction].item()
96
+
97
+ return {
98
+ "prediction": labels[prediction],
99
+ "confidence": confidence,
100
+ "probabilities": {
101
+ "REAL": probabilities[0][0].item(),
102
+ "FAKE": probabilities[0][1].item()
103
+ }
104
+ }
105
+
106
+ # Test the model
107
+ text = "Breaking: Scientists discover new planet in our solar system!"
108
+ result = predict_fake_news(text)
109
+ print(f"Prediction: {result['prediction']} ({result['confidence']:.2%} confidence)")
110
+ ```
111
+ ## Model Performance
112
+
113
+ This model achieves **research-grade performance** on fake news detection, with near-perfect accuracy across all metrics. The high precision and recall indicate excellent balance between catching fake news while avoiding false positives on real news.
114
+
115
+ ## Limitations and Bias
116
+
117
+ - Trained primarily on English news articles
118
+ - Performance may vary on news domains not represented in training data
119
+ - May reflect biases present in the source datasets
120
+ - Designed for binary classification (fake vs real) only
121
+
122
+ ## Citation
123
+ ```bibtex
124
+ @misc{fake-news-deberta-2025,
125
+ author = {Arko007},
126
+ title = {Elite Fake News Detection with DeBERTa-v3-Large},
127
+ year = {2025},
128
+ publisher = {Hugging Face},
129
+ url = {[https://huggingface.co/](https://huggingface.co/)Arko007/fact-check1-v1}
130
+ }
131
+ ```
132
+ ## License
133
+ MIT License - Feel free to use this model for research and applications.
134
+ ---
135
+ **Built with ❤️ using A100 80GB + DeBERTa-v3-Large**
added_tokens.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ {
2
+ "[MASK]": 128000
3
+ }
config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "DebertaV2ForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 1,
7
+ "dtype": "bfloat16",
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "REAL",
14
+ "1": "FAKE"
15
+ },
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 4096,
18
+ "label2id": {
19
+ "FAKE": 1,
20
+ "REAL": 0
21
+ },
22
+ "layer_norm_eps": 1e-07,
23
+ "legacy": true,
24
+ "max_position_embeddings": 512,
25
+ "max_relative_positions": -1,
26
+ "model_type": "deberta-v2",
27
+ "norm_rel_ebd": "layer_norm",
28
+ "num_attention_heads": 16,
29
+ "num_hidden_layers": 24,
30
+ "pad_token_id": 0,
31
+ "pooler_dropout": 0,
32
+ "pooler_hidden_act": "gelu",
33
+ "pooler_hidden_size": 1024,
34
+ "pos_att_type": [
35
+ "p2c",
36
+ "c2p"
37
+ ],
38
+ "position_biased_input": false,
39
+ "position_buckets": 256,
40
+ "relative_attention": true,
41
+ "share_att_key": true,
42
+ "transformers_version": "4.56.1",
43
+ "type_vocab_size": 0,
44
+ "vocab_size": 128100
45
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f279ccdbc7c41c13c5f725bc47b013dea682c4e85991ce12a2d79ebf204a5fba
3
+ size 870176748
special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "[CLS]",
3
+ "cls_token": "[CLS]",
4
+ "eos_token": "[SEP]",
5
+ "mask_token": "[MASK]",
6
+ "pad_token": "[PAD]",
7
+ "sep_token": "[SEP]",
8
+ "unk_token": {
9
+ "content": "[UNK]",
10
+ "lstrip": false,
11
+ "normalized": true,
12
+ "rstrip": false,
13
+ "single_word": false
14
+ }
15
+ }
spm.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c679fbf93643d19aab7ee10c0b99e460bdbc02fedf34b92b05af343b4af586fd
3
+ size 2464616
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[CLS]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[SEP]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[UNK]",
29
+ "lstrip": false,
30
+ "normalized": true,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "128000": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "[CLS]",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "[CLS]",
47
+ "do_lower_case": false,
48
+ "eos_token": "[SEP]",
49
+ "extra_special_tokens": {},
50
+ "mask_token": "[MASK]",
51
+ "model_max_length": 1000000000000000019884624838656,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "sp_model_kwargs": {},
55
+ "split_by_punct": false,
56
+ "tokenizer_class": "DebertaV2Tokenizer",
57
+ "unk_token": "[UNK]",
58
+ "vocab_type": "spm"
59
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:300ec60d973fb27d6b612f762ce97a8e6c6122ae69a54b62e74235626f8a2dc4
3
+ size 5841
training_summary.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_checkpoint": "microsoft/deberta-v3-large",
3
+ "final_metrics": {
4
+ "validation": {
5
+ "eval_loss": 0.0013762748567387462,
6
+ "eval_accuracy": 0.9998051441932969,
7
+ "eval_precision": 0.9996682149966821,
8
+ "eval_recall": 1.0,
9
+ "eval_f1": 0.9998340799734527,
10
+ "eval_runtime": 34.6469,
11
+ "eval_samples_per_second": 148.123,
12
+ "eval_steps_per_second": 1.183,
13
+ "epoch": 0.9968847352024922
14
+ },
15
+ "test": {
16
+ "eval_loss": 0.0015322713879868388,
17
+ "eval_accuracy": 0.9994154325798909,
18
+ "eval_precision": 0.9993362097577165,
19
+ "eval_recall": 0.999667994687915,
20
+ "eval_f1": 0.9995020746887967,
21
+ "eval_runtime": 34.9714,
22
+ "eval_samples_per_second": 146.748,
23
+ "eval_steps_per_second": 1.172,
24
+ "epoch": 0.9968847352024922
25
+ }
26
+ },
27
+ "training_config": {
28
+ "max_steps": 640,
29
+ "batch_size": 64,
30
+ "gradient_accumulation_steps": 2,
31
+ "learning_rate": 3e-05,
32
+ "max_length": 512,
33
+ "gradient_checkpointing": "non-reentrant"
34
+ },
35
+ "dataset_stats": {
36
+ "total_examples": 51319,
37
+ "train_size": 41055,
38
+ "val_size": 5132,
39
+ "test_size": 5132,
40
+ "fake_samples": 30123,
41
+ "real_samples": 21196
42
+ },
43
+ "runtime_hours": 0.4279349425766203
44
+ }