|
--- |
|
library_name: transformers |
|
tags: |
|
- legal |
|
datasets: |
|
- ealvaradob/phishing-dataset |
|
language: |
|
- en |
|
metrics: |
|
- accuracy |
|
- precision |
|
- recall |
|
- f1 |
|
base_model: |
|
- distilbert/distilbert-base-uncased |
|
--- |
|
|
|
# 📧 distilbert-finetuned-phishing |
|
|
|
A fine-tuned `distilbert-base-uncased` model for phishing email classification. This model is designed to distinguish between **safe** and **phishing** emails using natural language content. |
|
|
|
[Colab Notebook](https://colab.research.google.com/drive/1_M5BVn9agRHUSN3wBPebfxfOpBqTJcwh?usp=sharing) |
|
--- |
|
|
|
## 🧪 Evaluation Results |
|
|
|
The model was trained on 77,677 emails and evaluated with the following results: |
|
|
|
| Metric | Value | |
|
|---------------|---------| |
|
| Accuracy | 0.9639 | |
|
| Precision | 0.9648 | |
|
| Recall | 0.9489 | |
|
| F1 Score | 0.9568 | |
|
| Eval Loss | 0.1326 | |
|
|
|
--- |
|
|
|
|
|
### ⚙️ Training Configuration |
|
TrainingArguments( |
|
output_dir="./hf-phishing-model", |
|
evaluation_strategy="epoch", |
|
save_strategy="epoch", |
|
learning_rate=2e-5, |
|
per_device_train_batch_size=16, |
|
per_device_eval_batch_size=64, |
|
num_train_epochs=3, |
|
weight_decay=0.01, |
|
logging_dir="./logs", |
|
load_best_model_at_end=True, |
|
fp16=torch.cuda.is_available(), |
|
) |