|  | --- | 
					
						
						|  | language: en | 
					
						
						|  | license: mit | 
					
						
						|  | tags: | 
					
						
						|  | - keras | 
					
						
						|  | - lstm | 
					
						
						|  | - spam-classification | 
					
						
						|  | - text-classification | 
					
						
						|  | - binary-classification | 
					
						
						|  | - email | 
					
						
						|  | - deep-learning | 
					
						
						|  | library_name: keras | 
					
						
						|  | pipeline_tag: text-classification | 
					
						
						|  | model_name: Spam Email Classifier (BiLSTM) | 
					
						
						|  | datasets: | 
					
						
						|  | - SetFit/enron_spam | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | # π§ Spam Email Classifier using BiLSTM | 
					
						
						|  |  | 
					
						
						|  | This model uses a **Bidirectional LSTM (BiLSTM)** architecture built with **Keras** to classify email messages as **Spam** or **Ham**. It was trained on the [Enron Spam Dataset](https://huggingface.co/datasets/SetFit/enron_spam) using GloVe word embeddings. | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π§  Model Architecture | 
					
						
						|  |  | 
					
						
						|  | - **Tokenizer**: Keras `Tokenizer` trained on the Enron dataset | 
					
						
						|  | - **Embedding**: Pretrained [GloVe.6B.100d](https://nlp.stanford.edu/projects/glove/) | 
					
						
						|  | - **Model**: `Embedding β BiLSTM β Dropout β Dense(sigmoid)` | 
					
						
						|  | - **Input**: English email/message text | 
					
						
						|  | - **Output**: `0 = Ham`, `1 = Spam` | 
					
						
						|  |  | 
					
						
						|  | --- | 
					
						
						|  |  | 
					
						
						|  | ## π§ͺ Example Usage | 
					
						
						|  |  | 
					
						
						|  | ```python | 
					
						
						|  | from tensorflow.keras.models import load_model | 
					
						
						|  | from huggingface_hub import hf_hub_download | 
					
						
						|  | import pickle | 
					
						
						|  | from tensorflow.keras.preprocessing.sequence import pad_sequences | 
					
						
						|  |  | 
					
						
						|  | # Load files from HF Hub | 
					
						
						|  | model_path = hf_hub_download("lokas/spam-emails-classifier", "model.h5") | 
					
						
						|  | tokenizer_path = hf_hub_download("lokas/spam-emails-classifier", "tokenizer.pkl") | 
					
						
						|  |  | 
					
						
						|  | # Load model and tokenizer | 
					
						
						|  | model = load_model(model_path) | 
					
						
						|  | with open(tokenizer_path, "rb") as f: | 
					
						
						|  | tokenizer = pickle.load(f) | 
					
						
						|  |  | 
					
						
						|  | # Prediction function | 
					
						
						|  | def predict_spam(text): | 
					
						
						|  | seq = tokenizer.texts_to_sequences([text]) | 
					
						
						|  | padded = pad_sequences(seq, maxlen=50)  # must match training maxlen | 
					
						
						|  | pred = model.predict(padded)[0][0] | 
					
						
						|  | return "π« Spam" if pred > 0.5 else "β
 Not Spam" | 
					
						
						|  |  | 
					
						
						|  | # Example | 
					
						
						|  | print(predict_spam("Win a free iPhone now!")) | 
					
						
						|  |  |