TruthCheck / README.md
adnaan05's picture
Update README.md
60cec19 verified

A newer version of the Streamlit SDK is available: 1.47.1

Upgrade
metadata
title: TrueCheck - Fake News Detection
emoji: πŸ“°
colorFrom: red
colorTo: blue
sdk: streamlit
sdk_version: 1.28.1
app_file: app.py
pinned: false
license: mit

TruthCheck: Fake News Detection with Fine-Tuned BERT

TruthCheck is an advanced fake news detection system leveraging a hybrid deep learning architecture. It combines a pre-trained BERT-base-uncased model with a BiLSTM and attention mechanism, fully fine-tuned on a curated dataset of real and fake news. The project includes robust preprocessing, feature extraction, model training, evaluation, and a Streamlit web app for interactive predictions.


πŸš€ Features

  • Hybrid Model: BERT-base-uncased + BiLSTM + Attention
  • Full Fine-Tuning: All layers of BERT and additional layers are trainable and optimized on the fake news dataset
  • Comprehensive Preprocessing: Cleaning, tokenization, lemmatization, and more
  • Training & Evaluation: Scripts for training, validation, and test evaluation
  • Interactive App: Streamlit web app for real-time news classification
  • Ready for Deployment: Easily extendable for research or production

🧠 Model Details

  • Base Model: BERT-base-uncased
  • Architecture:
    • BERT encoder (pre-trained, all layers fine-tuned)
    • BiLSTM layer for sequential context
    • Attention mechanism for interpretability
    • Fully connected classification head
  • Fine-Tuning Technique:
    • All BERT layers are unfrozen and updated during training (full fine-tuning)
    • Additional layers (BiLSTM, attention, classifier) are trained from scratch

πŸ“₯ Download Data and Model

Raw and Processed Datasets:
Google Drive Link

Trained Model(s):
Google Drive Link

Instructions:

  1. Download the datasets and place them in the data/ directory:
    • data/raw/ for raw files
    • data/processed/ for processed files
  2. Download the trained model (e.g., final_model.pt or best_model.pt) and place it in models/saved/.

βš™οΈ Setup

  1. Clone the repository:
    git clone https://github.com/adnaan-tariq/fake-news-detection.git
    cd fake-news-detection
    
  2. Create and activate a virtual environment:
    python -m venv venv
    .\venv\Scripts\activate
    
  3. Install dependencies:
    pip install --upgrade pip
    pip install -r requirements.txt
    

πŸƒβ€β™‚οΈ Usage

Train the Model

If you want to train from scratch (after placing the data as described above):

python -m src.train

Run the Streamlit App

streamlit run app.py

Test the Model

  • The app and scripts will use the model in models/saved/final_model.pt by default.
  • For custom inference, see the example in src/app.py or ask for a sample script.

πŸ“Š Results

  • Validation Accuracy: ~93%
  • Validation F1 Score: ~0.93
  • (See training logs and visualizations for more details.)

πŸ“¦ Data & Model Policy

  • Data and model files are NOT included in this repository.
  • Please download them from the provided Google Drive links above.

🀝 Contributing

Pull requests and suggestions are welcome! For major changes, please open an issue first to discuss what you would like to change.


πŸ“„ License

This project is licensed under the MIT License.