Wazuh SecRoBERTa Security Log Classifier

Model Description

This is a fine-tuned SecRoBERTa model for classifying Wazuh security logs into three categories:

  • Benign (0): Normal, safe activities
  • Suspicious (1): Potentially concerning activities that require monitoring
  • Malicious (2): Confirmed threats requiring immediate action

The model is based on jackaduma/SecRoBERTa and fine-tuned using LoRA (Low-Rank Adaptation) for efficient parameter updates.

Model Architecture

  • Base Model: SecRoBERTa (Security-focused RoBERTa)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
  • Classification Head: 3-class classifier
  • Additional Features: 136-dimensional feature vector for log metadata
  • Max Sequence Length: 512 tokens

Training Details

  • Training Framework: PyTorch + HuggingFace Transformers + PEFT
  • Loss Function: Focal Loss (for handling class imbalance)
  • Optimization: AdamW with learning rate scheduling
  • Data: Wazuh security logs

Usage

Using transformers library:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
model_name = "pyToshka/wazuh-assist"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Prepare input
text = "Failed login attempt from IP 192.168.1.100"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    logits = outputs.logits
    predicted_class = torch.argmax(logits, dim=-1).item()

# Class mapping
class_names = ["benign", "suspicious", "malicious"]
prediction = class_names[predicted_class]
print(f"Prediction: {prediction}")

Using the project's custom class:

from src.models.secroberta import WazuhSecRoBERTa

# Load model
model = WazuhSecRoBERTa.load_model("pyToshka/wazuh-assist")

# Make prediction
log_text = "Failed login attempt from IP 192.168.1.100"
prediction, confidence = model.predict(log_text)
print(f"Prediction: {prediction} (confidence: {confidence:.3f})")

Performance

The model achieves strong performance on Wazuh log classification:

  • High precision for malicious activity detection
  • Good recall for suspicious activity monitoring
  • Balanced accuracy across all three classes

Deployment

This model can be deployed using:

  • ONNX Runtime: For production inference
  • FastAPI: REST API server included in the project
  • Docker: Containerized deployment available

Citation

@misc{wazuh-assist-2025,
  title={Wazuh SecRoBERTa Security Log Classifier},
  author={Your Organization},
  year={2024},
  howpublished={\url{https://huggingface.co/pyToshka/wazuh-assist}},
}

License

BSD 3-Clause License

Downloads last month
121
Safetensors
Model size
83.5M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support