Spaces:

JainilP30
/

Fake-News-Detector

Running

App Files Files Community

Adding nltk_data

by aaryan24 - opened Jun 9

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+11

-101

Files changed (5) hide show

LICENSE +0 -21
README.md +1 -57
app.py +6 -16
nltk_data.zip +0 -3
requirements.txt +4 -4

LICENSE DELETED Viewed

@@ -1,21 +0,0 @@
-MIT License
-Copyright (c) 2025 Jainil Patel
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-The above copyright notice and this permission notice shall be included in
-all copies or substantial portions of the Software.
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
-THE SOFTWARE.

README.md CHANGED Viewed

@@ -11,60 +11,4 @@ license: mit
 short_description: 'Detects Fake News using the ensemble of 3 Models '
 ---
-# 📚 Fake News Detector
-**Detects Fake News using an ensemble of 3 Models (Naive Bayes, Logistic Regression, and GloVe-based embeddings)**
----
-## 🚨 Important Disclaimer
-> ⚠️ This project is built purely for **educational and experimental purposes** to explore basic Natural Language Processing (NLP) and Machine Learning (ML) techniques.
->
-> ❗ It is **not suitable for real-world fact-checking or decision-making**.
->
-> The models used are simple, non-contextual, and cannot understand language nuances or factual correctness. Misusing this tool for serious analysis may lead to incorrect or harmful conclusions.
->
-> **Please do not trust or rely on the outputs of this demo.** It is meant for **learning only.**
----
-## 🎯 Purpose
-This project was created as a part of our research internship as a way to:
-- Practice building an ensemble model using different NLP approaches
-- Learn to deploy ML apps with Gradio and Hugging Face Spaces
-- Experiment with basic text classification on news headlines/articles
-It is **not** a robust or reliable system for determining truth or accuracy in media.
----
-## ⚙️ How It Works
-This Fake News Detector uses an ensemble of 3 models:
-1. **Naive Bayes with TF-IDF** – assigns 55% weight
-2. **Logistic Regression** – assigns 10% weight
-3. **GloVe Embedding-Based Classifier** – assigns 35% weight
-Each model contributes a score between 0 and 1 indicating the likelihood of the input text being "Real." The final prediction is based on a weighted average.
----
-## 📄 License & Attribution
-This project is licensed under the **MIT License**.
-### Libraries and Tools Used:
-- 🧠 [GloVe Embeddings by Stanford NLP](https://nlp.stanford.edu/projects/glove/)
-- 🌐 [Gradio Interface Library](https://www.gradio.app/)
-- 📚 [scikit-learn](https://scikit-learn.org/) for model implementation
-- 🛠 [NLTK](https://www.nltk.org/) for basic NLP preprocessing
--    [Dataset](https://www.kaggle.com/datasets/stevenpeutz/misinformation-fake-news-text-dataset-79k)
-## 📦 Installation
-```bash
-pip install -r requirements.txt
-python app.py

 short_description: 'Detects Fake News using the ensemble of 3 Models '
 ---
+Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

app.py CHANGED Viewed

@@ -9,18 +9,9 @@ from nltk.corpus import stopwords
 from nltk.stem import WordNetLemmatizer
 from nltk.tokenize import word_tokenize
 import nltk
-import os
-import zipfile
-# Unzip local nltk_data.zip if not already unzipped
-nltk_data_path = os.path.join(os.path.dirname(__file__), 'nltk_data')
-if not os.path.exists(nltk_data_path):
-    with zipfile.ZipFile('nltk_data.zip', 'r') as zip_ref:
-        zip_ref.extractall(nltk_data_path)
-# Tell NLTK to use the local data path
-nltk.data.path.append(nltk_data_path)
 # ============ Load Models and Tokenizers ============
 with open("logreg_model.pkl", "rb") as f:
@@ -63,7 +54,7 @@ def predict_ensemble(text):
     cleaned = clean_text(text)
     # Check if cleaned text is too short
-    if len(cleaned.strip()) <= 10:
         return "Input too short to analyze."
     # TF-IDF-based predictions
@@ -77,8 +68,8 @@ def predict_ensemble(text):
     prob_glove = model_glove.predict(glove_pad)[0][0]
     # Weighted ensemble
-    ensemble_score = 0.50 * prob_nb + 0.1 * prob_logreg + 0.40 * prob_glove
-    label = "✅ Real News" if ensemble_score >= 0.47 else "❌ Fake News"
     # Optional: Include probabilities
     # Naive Bayes:
@@ -101,7 +92,6 @@ interface = gr.Interface(
     outputs=gr.Markdown(label="Prediction"),
     title="📰 Fake News Detector",
     description="This tool uses 3 models (Naive Bayes, Logistic Regression, GloVe-based Deep Learning) to classify news as real or fake using an ensemble method.",
-    article="⚠️ **Disclaimer:** This demo is for educational and experimental purposes only. It is not suitable for real-world fact-checking or decision-making. Please do not rely on this tool.",
     allow_flagging="never"
 )

 from nltk.stem import WordNetLemmatizer
 from nltk.tokenize import word_tokenize
 import nltk
+nltk.download('punkt')
+nltk.download('stopwords')
+nltk.download('wordnet')
 # ============ Load Models and Tokenizers ============
 with open("logreg_model.pkl", "rb") as f:
     cleaned = clean_text(text)
     # Check if cleaned text is too short
+    if len(cleaned.strip()) == 10:
         return "Input too short to analyze."
     # TF-IDF-based predictions
     prob_glove = model_glove.predict(glove_pad)[0][0]
     # Weighted ensemble
+    ensemble_score = 0.55 * prob_nb + 0.1 * prob_logreg + 0.35 * prob_glove
+    label = "✅ Real News" if ensemble_score >= 0.45 else "❌ Fake News"
     # Optional: Include probabilities
     # Naive Bayes:
     outputs=gr.Markdown(label="Prediction"),
     title="📰 Fake News Detector",
     description="This tool uses 3 models (Naive Bayes, Logistic Regression, GloVe-based Deep Learning) to classify news as real or fake using an ensemble method.",
     allow_flagging="never"
 )

nltk_data.zip DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e7ca5b931a531c962d2539b042daf7d37badd4ce59523dfa063083f61a1dae72
-size 52292335

requirements.txt CHANGED Viewed

@@ -1,5 +1,5 @@
-gradio
-tensorflow
-scikit-learn
-nltk==3.7
 numpy

+gradio
+tensorflow
+scikit-learn
+nltk
 numpy