π§ results β DistilBERT for Malicious Traffic Classification
This model is a fine-tuned version of distilbert-base-uncased
for binary classification of network traffic, especially useful for distinguishing malicious vs. benign packets based on preprocessed Wireshark-style logs.
π Evaluation Results
Metric | Value |
---|---|
Accuracy | 1.0 |
Precision | 1.0 |
Recall | 1.0 |
F1 Score | 1.0 |
Eval Loss | 0.0000 |
β οΈ These perfect results are on the validation set and may not generalize to unseen or noisy real-world data. Be sure to test on diverse inputs.
π§© Model Description
This model uses the lightweight and efficient DistilBERT transformer, fine-tuned for binary classification. Input data should be short text sequences (e.g., protocol descriptions, IP headers, or Wireshark logs).
π‘ Intended Use & Limitations
β Intended Uses
- Malicious traffic detection (from packet text)
- Intrusion detection system (IDS) aid
- Sentiment analysis or spam detection (if retrained)
β Limitations
- English and network-related text only
- Binary classification (0 = benign, 1 = malicious)
- Not trained on raw PCAPs β requires preprocessing
ποΈ Training Procedure
- Model:
distilbert-base-uncased
- Framework:
Transformers
Trainer API - Optimizer: AdamW
- Scheduler: Linear LR decay
- Epochs: 3
- Batch Size: 16
- Seed: 42
π Training and Evaluation Data
The model was trained on a custom dataset with binary labels:
input
: stringified packet details (e.g., IPs, protocol, flags)BinaryLabel
:0
= benign,1
= malicious
Text was tokenized using the DistilBERT tokenizer with truncation and padding.
π§ͺ Example Usage
π Hugging Face Pipeline (Single Prediction)
from transformers import pipeline
# Load from Hugging Face Hub
classifier = pipeline("text-classification", model="TanmaySK/results")
# Predict
text = "SrcIP:10.0.0.1 DstIP:192.168.1.1 Protocol:TCP Flags:SYN"
result = classifier(text)
# Interpret label
label_map = {"LABEL_0": "Benign", "LABEL_1": "Malicious"}
print(f"Prediction: {label_map[result[0]['label']]} (Confidence: {result[0]['score']:.4f})")
## π CSV Batch Prediction (Local Wireshark Data)
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load model
model = AutoModelForSequenceClassification.from_pretrained("TanmaySK/results")
tokenizer = AutoTokenizer.from_pretrained("TanmaySK/results")
# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()
# Load CSV
df = pd.read_csv("wireshark_unlabeled.csv") # Must have 'input' column
label_map = {0: "Benign", 1: "Malicious"}
predictions = []
# Predict each row
for text in df["input"]:
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
inputs = {k: v.to(device) for k, v in inputs.items() if k != "token_type_ids"}
with torch.no_grad():
logits = model(**inputs).logits
pred = torch.argmax(logits, dim=1).item()
predictions.append(pred)
# Save results
df["PredictedLabel"] = predictions
df["PredictionText"] = [label_map[p] for p in predictions]
df.to_csv("wireshark_predictions.csv", index=False)
print("β
Saved to wireshark_predictions.csv")
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for TanmaySK/results
Base model
distilbert/distilbert-base-uncased