🧠 results – DistilBERT for Malicious Traffic Classification

This model is a fine-tuned version of distilbert-base-uncased for binary classification of network traffic, especially useful for distinguishing malicious vs. benign packets based on preprocessed Wireshark-style logs.


πŸ“Š Evaluation Results

Metric Value
Accuracy 1.0
Precision 1.0
Recall 1.0
F1 Score 1.0
Eval Loss 0.0000

⚠️ These perfect results are on the validation set and may not generalize to unseen or noisy real-world data. Be sure to test on diverse inputs.


🧩 Model Description

This model uses the lightweight and efficient DistilBERT transformer, fine-tuned for binary classification. Input data should be short text sequences (e.g., protocol descriptions, IP headers, or Wireshark logs).


πŸ’‘ Intended Use & Limitations

βœ… Intended Uses

  • Malicious traffic detection (from packet text)
  • Intrusion detection system (IDS) aid
  • Sentiment analysis or spam detection (if retrained)

❌ Limitations

  • English and network-related text only
  • Binary classification (0 = benign, 1 = malicious)
  • Not trained on raw PCAPs β€” requires preprocessing

πŸ‹οΈ Training Procedure

  • Model: distilbert-base-uncased
  • Framework: Transformers Trainer API
  • Optimizer: AdamW
  • Scheduler: Linear LR decay
  • Epochs: 3
  • Batch Size: 16
  • Seed: 42

πŸ“Š Training and Evaluation Data

The model was trained on a custom dataset with binary labels:

  • input: stringified packet details (e.g., IPs, protocol, flags)
  • BinaryLabel: 0 = benign, 1 = malicious

Text was tokenized using the DistilBERT tokenizer with truncation and padding.


πŸ§ͺ Example Usage

πŸ”Œ Hugging Face Pipeline (Single Prediction)

from transformers import pipeline

# Load from Hugging Face Hub
classifier = pipeline("text-classification", model="TanmaySK/results")

# Predict
text = "SrcIP:10.0.0.1 DstIP:192.168.1.1 Protocol:TCP Flags:SYN"
result = classifier(text)

# Interpret label
label_map = {"LABEL_0": "Benign", "LABEL_1": "Malicious"}
print(f"Prediction: {label_map[result[0]['label']]} (Confidence: {result[0]['score']:.4f})")


## πŸ“ CSV Batch Prediction (Local Wireshark Data)
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model
model = AutoModelForSequenceClassification.from_pretrained("TanmaySK/results")
tokenizer = AutoTokenizer.from_pretrained("TanmaySK/results")

# Device setup
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Load CSV
df = pd.read_csv("wireshark_unlabeled.csv")  # Must have 'input' column
label_map = {0: "Benign", 1: "Malicious"}
predictions = []

# Predict each row
for text in df["input"]:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True, max_length=512)
    inputs = {k: v.to(device) for k, v in inputs.items() if k != "token_type_ids"}

    with torch.no_grad():
        logits = model(**inputs).logits
        pred = torch.argmax(logits, dim=1).item()
        predictions.append(pred)

# Save results
df["PredictedLabel"] = predictions
df["PredictionText"] = [label_map[p] for p in predictions]
df.to_csv("wireshark_predictions.csv", index=False)
print("βœ… Saved to wireshark_predictions.csv")
Downloads last month
2
Safetensors
Model size
46.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for TanmaySK/results

Finetuned
(9345)
this model