Model Card for DistilBERT-PhishGuard
Model Overview
URLShield-DistilBERT is a phishing URL detection model based on DistilBERT, fine-tuned specifically for the task of identifying whether a URL is safe or phishing. This model is designed for real-time applications in web and email security, helping users identify malicious links.
Intended Use
- Use Cases: URL classification for phishing detection in emails, websites, and chat applications.
- Limitations: This model may have reduced accuracy with non-English URLs or heavily obfuscated links.
- Intended Users: Security researchers, application developers, and cybersecurity engineers.
Model Card for DistilBERT-PhishGuard
π What Sets PhishGuard Apart? High Accuracy π β Achieved up to 99.6% accuracy and 0.997 AUC on validation datasets. Optimized for Speed π β Leveraging a distilled transformer model for faster predictions without compromising accuracy. Real-World Data π β Trained and evaluated on diverse phishing and safe URLs, ensuring robust performance across domains. π Performance Metrics (Averaged Across Epochs) Accuracy: 99.6% AUC (Area Under Curve): 0.997 Training Loss: 0.054 Validation Loss: 0.047
Markdown
Support the Project
If you find this project useful, consider buying me a coffee to support further development! βοΈ
Usage
This model can be loaded and used with Hugging Face's transformers
library:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
#Load the model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/DistilBERT-PhishGuard")
model = AutoModelForSequenceClassification.from_pretrained("your-username/DistilBERT-PhishGuard")
#Sample URL for classification
url = "http://example.com"
inputs = tokenizer(url, return_tensors="pt", truncation=True, max_length=256)
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=-1)
print("Prediction:", "Phishing" if predictions.item() == 1 else "Safe")
Performance
The model achieves high accuracy across different chunks of training data, with performance metrics above 98% accuracy and an AUC close to or at 1.00 in later stages. This indicates robust and reliable phishing detection across varied datasets.
Limitations and Biases
The model's performance may degrade on URLs containing obfuscated or novel phishing techniques. It may be less effective on non-English URLs and may need further fine-tuning for different languages or domain-specific URLs.
Contact and Support
For questions, improvements, or support, please contact us through the Hugging Face community or open an issue in the model repository.
- Downloads last month
- 13
Model tree for Adnan-AI-Labs/URLShield-DistilBERT
Base model
distilbert/distilbert-base-uncased