lokas
/

spam-emails-classifier

Text Classification

spam-classification

binary-classification

Model card Files Files and versions

Metrics Training metrics Community

spam-emails-classifier / README.md

lokas's picture

Update README.md

87b08b3 verified 4 months ago

|

1.76 kB

	---
	language: en
	license: mit
	tags:
	- keras
	- lstm
	- spam-classification
	- text-classification
	- binary-classification
	- email
	- deep-learning
	library_name: keras
	pipeline_tag: text-classification
	model_name: Spam Email Classifier (BiLSTM)
	datasets:
	- SetFit/enron_spam
	---

	# 📧 Spam Email Classifier using BiLSTM

	This model uses a Bidirectional LSTM (BiLSTM) architecture built with Keras to classify email messages as Spam or Ham. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings.

	---

	## 🧠 Model Architecture

	- Tokenizer: Keras `Tokenizer` trained on the Enron dataset
	- Embedding: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/)
	- Model: `Embedding → BiLSTM → Dropout → Dense(sigmoid)`
	- Input: English email/message text
	- Output: `0 = Ham`, `1 = Spam`

	---

	## 🧪 Example Usage

	```python
	from tensorflow.keras.models import load_model
	from huggingface_hub import hf_hub_download
	import pickle
	from tensorflow.keras.preprocessing.sequence import pad_sequences

	# Load files from HF Hub
	model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5")
	tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl")

	# Load model and tokenizer
	model = load_model(model_path)
	with open(tokenizer_path, "rb") as f:
	tokenizer = pickle.load(f)

	# Prediction function
	def predict_spam(text):
	seq = tokenizer.texts_to_sequences([text])
	padded = pad_sequences(seq, maxlen=50) # must match training maxlen
	pred = model.predict(padded)[0][0]
	return "🚫 Spam" if pred > 0.5 else "✅ Not Spam"

	# Example
	print(predict_spam("Win a free iPhone now!"))