VLAI: A RoBERTa-Based Model for Automated Vulnerability Severity Classification
Abstract
A transformer-based model predicts software vulnerability severity levels directly from text, enhancing triage efficiency and consistency.
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
Community
At CIRCL (Computer Incident Response Center Luxembourg), we faced the challenge of evaluating vulnerabilities with only partial information often just a textual description.
To address this, we built an NLP model using the existing dataset from Vulnerability Lookup. The entire solution has now been released, including integration into the free online service and the open-source code. With this model, you can obtain the VLAI vulnerability score even when no existing score is available, by assessing severity based solely on the description.
This paper presents VLAI, a transformer-based model that predicts software vulnerability severity levels directly from text descriptions. Built on RoBERTa, VLAI is fine-tuned on over 600,000 real-world vulnerabilities and achieves over 82% accuracy in predicting severity categories, enabling faster and more consistent triage ahead of manual CVSS scoring. The model and dataset are open-source and integrated into the Vulnerability-Lookup service.
The model is used in Vulnerability-Lookup thanks to our ML-Gateway component.