Update README.md
Browse files
README.md
CHANGED
@@ -36,7 +36,7 @@ extra_gated_fields:
|
|
36 |
|
37 |
# Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
|
38 |
|
39 |
-
<img src="https://i.imgur.com/PtqeTZw.png" alt="
|
40 |
|
41 |
> TL;DR: Llama-Primus-Base is a foundation model based on Llama-3.1-8B-Instruct, continually pre-trained on Primus-Seed (0.2B) and Primus-FineWeb (2.57B). Primus-Seed is a high-quality, manually curated cybersecurity text dataset, while Primus-FineWeb consists of cybersecurity texts filtered from FineWeb. By pretraining on such a large-scale cybersecurity corpus, it achieves a 🚀**15.88%** improvement in aggregated scores across multiple cybersecurity benchmarks, demonstrating the effectiveness of cybersecurity-specific pretraining.
|
42 |
|
|
|
36 |
|
37 |
# Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM Training
|
38 |
|
39 |
+
<img src="https://i.imgur.com/PtqeTZw.png" alt="Primus Overview" width="60%">
|
40 |
|
41 |
> TL;DR: Llama-Primus-Base is a foundation model based on Llama-3.1-8B-Instruct, continually pre-trained on Primus-Seed (0.2B) and Primus-FineWeb (2.57B). Primus-Seed is a high-quality, manually curated cybersecurity text dataset, while Primus-FineWeb consists of cybersecurity texts filtered from FineWeb. By pretraining on such a large-scale cybersecurity corpus, it achieves a 🚀**15.88%** improvement in aggregated scores across multiple cybersecurity benchmarks, demonstrating the effectiveness of cybersecurity-specific pretraining.
|
42 |
|