Text Generation
Transformers
Safetensors
English
Japanese
llama
cybersecurity
conversational
text-generation-inference

Llama-Primus-Nemotron-70B-Instruct

Llama-Primus-Nemorton

Introduction

The Llama-Primus-Nemotron series builds upon nvidia/Llama-3.1-Nemotron-70B-Instruct through continued training. Following the same methodology as described in the Primus paper, we first performed pre-training on large-scale cybersecurity corpora (over 10B tokens) to obtain Llama-Primus-Nemotron-Base. We then conducted supervised-finetuning and applied DELLA to merge with the original Nemotron, resulting in Llama-Primus-Nemotron-70B-Instruct.

Llama-Primus-Nemotron-70B-Instruct achieves an 18.18% improvement in aggregate scores across several public cybersecurity benchmarks, while maintaining same performance in general-purpose instruction following benchmark (Arena Hard).


Benchmark Results

Cybersecurity

Metric (5-shot, w/ chat template) Llama-3.1-Nemotron-70B-Instruct Llama-Primus-Nemotron-70B-Instruct
CTI-Bench (MCQ) 0.6320 0.7148
CTI-Bench (CVE → CWE) 0.6020 0.6770
CTI-Bench (CVSS, lower is better) 1.4523 1.2469
CTI-Bench (ATE) 0.4284 0.5039
CyberMetric (500) 0.9240 0.9280
SecEval 0.6875 0.7095
CISSP (Exam Questions) 0.8428 0.8625
Aggregate 2.6644 3.1488 ↑18.18% 🔥

CTI-Bench(CVSS) is scored using Mean Absolute Deviation (lower is better), CTI-ATE uses F1 score, and the others use accuracy. The aggregate score (Agg.) is the sum of all benchmarks, with CTI-Bench(CVSS) negated.

References:

General Chat Performance

Metric Llama-3.1-Nemotron-70B-Instruct Llama-Primus-Nemotron-70B-Instruct
Arena Hard 85.1 85.8

Reference:

Safety & Toxicity

Metric Llama-3.1-Nemotron-70B-Instruct Primus-Labor-70B (Llama-3.1-Nemotron-70B-Instruct) 🔥
dan (Jailbreak) 43.14% 61.96%
encoding (Jailbreak) 93.37% 96.87%
goodside (Hallucination / Prompt Injection) 75.00% 72.50%
latentinjection (Prompt Injection) 62.46% 70.35%
leakreplay (Copyright) 88.23% 92.43%
malwaregen (Disallowed content) 18.99% 25.84%
realtoxicityprompts (Disallowed content) 97.55% 98.25%
snowball (Hallucination) 100.00% 100.00%
xss (Prompt Injection) 81.67% 100.00%
XSTest (Over Refusal) 94.40% 97.20%

References:


Training Datasets

Pre-training:

  • Primus-Seed-V2 (0.457B): An enhanced version of Primus-Seed, enriched with blogs, news, books, websites, Wikipedia, MITRE and Trend Micro knowledge.
  • Primus-FineWeb (2.57B): Cybersecurity text filtered from FineWeb-edu-score-2. Link
  • Primus-Nemotron-CC (7.6B): Cybersecurity text filtered from Nemotron-CC.

SFT:

  • Primus-Instruct: LINK

Note: Datasets Primus-Seed-V2 and Primus-Nemotron-CC are not yet open-sourced and are currently under discussion. Feel free to reach out if you're interested.

Disclaimer: No Trend Micro customer information is included.


About Primus

Primus is Trend Micro's pioneering family of lightweight, state-of-the-art open cybersecurity language models and datasets. Developed through our cutting-edge research initiatives and advanced technology, these resources share the innovative foundation that powers our enterprise-class Trend Cybertron solution. As an industry leader in cybersecurity, Trend Micro is proud to contribute these powerful, efficiency-optimized models and datasets to the community, while maintaining the excellence and reliability that define our global security standards.

Acknowledgments

We would like to thank NVIDIA for generously providing computing resources (Taipei-1), which enabled the training and development of this model.

License

This model is based on the MIT license, but you must also comply with the Llama 3.1 Community License Agreement.

Downloads last month
237
Safetensors
Model size
70.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for trend-cybertron/Llama-Primus-Nemotron-70B-Instruct

Datasets used to train trend-cybertron/Llama-Primus-Nemotron-70B-Instruct

Collection including trend-cybertron/Llama-Primus-Nemotron-70B-Instruct