Update README.md
Browse files
README.md
CHANGED
@@ -2,16 +2,56 @@
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
-
# Fine-Tuned
|
6 |
|
7 |
This model is a fine-tune of [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), via Knowledge Distillation of [0dAI-7.5B](https://huggingface.co/0dAI/0dAI-7.5B-v2).
|
8 |
The fine-tuning was conducted using a curated corpus of 950 cybersecurity rules from SIGMA, YARA, and Suricata repositories for threat and intrusion detection.
|
9 |
|
|
|
|
|
|
|
|
|
|
|
10 |
## Key Features
|
11 |
- Fine-tuned on a corpus of cybersecurity threat and intrusion detection rules.
|
12 |
- Expert in generating YARA, Suricata, and SIGMA rules.
|
13 |
- Based on Mistral-7B-Instruct-v0.2, with a 32K context window.
|
14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
15 |
## License
|
16 |
This repository is licensed under the Apache License, Version 2.0. You can obtain a copy of the license at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
|
17 |
|
|
|
2 |
license: apache-2.0
|
3 |
---
|
4 |
|
5 |
+
# Fine-Tuned model for threat and intrusion detection rules generation
|
6 |
|
7 |
This model is a fine-tune of [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), via Knowledge Distillation of [0dAI-7.5B](https://huggingface.co/0dAI/0dAI-7.5B-v2).
|
8 |
The fine-tuning was conducted using a curated corpus of 950 cybersecurity rules from SIGMA, YARA, and Suricata repositories for threat and intrusion detection.
|
9 |
|
10 |
+
Instruct the model to craft a SIGMA rule for detecting potentially malicious commands such as `msfvenom` and `netcat` in Audit system logs, or a Suricata rule to spot SSH brute-force attacks, or even a YARA rule to identify obfuscated strings in files — and watch the magic happen! Automate the creation of rules in your cybersecurity systems with this model.
|
11 |
+
|
12 |
+
For an in-depth understanding of how this model has been fine-tuned, refer to the associated paper here: [link to the paper].
|
13 |
+
|
14 |
+
|
15 |
## Key Features
|
16 |
- Fine-tuned on a corpus of cybersecurity threat and intrusion detection rules.
|
17 |
- Expert in generating YARA, Suricata, and SIGMA rules.
|
18 |
- Based on Mistral-7B-Instruct-v0.2, with a 32K context window.
|
19 |
|
20 |
+
|
21 |
+
## Quantization
|
22 |
+
You can easily quantize your model for local use on your computer with the help of the `llama.cpp` or `ollama` libraries. This process converts your model into a format that is optimized for performance, particularly useful for deployment on devices with limited computational resources.
|
23 |
+
|
24 |
+
To perform this quantization using the `llama.cpp` library ([link to llama.cpp](https://github.com/ggerganov/llama.cpp)), follow the steps below:
|
25 |
+
|
26 |
+
### Step 1: Convert Vocabulary
|
27 |
+
First, convert your model's vocabulary to a format suitable for quantization. Use the following command, replacing `/path/to/` with the actual path to your model files:
|
28 |
+
```bash
|
29 |
+
python convert.py /path/to/Mistral-7B-cybersecurity-rules \
|
30 |
+
--vocab-only \
|
31 |
+
--outfile /path/to/Mistral-7B-cybersecurity-rules/tokenizer.model \
|
32 |
+
--vocab-type bpe
|
33 |
+
```
|
34 |
+
This command extracts and converts the vocabulary using the byte pair encoding (BPE) method, saving it to a new file.
|
35 |
+
|
36 |
+
### Step 2: Prepare Model for Quantization
|
37 |
+
Next, prepare the model for quantization by converting it to a half-precision floating-point format (FP16). This step reduces the model size and prepares it for the final quantization to 8-bit integers. Execute the following command:
|
38 |
+
```bash
|
39 |
+
python convert.py \
|
40 |
+
--outtype f16 \
|
41 |
+
--vocab-type bpe \ # Add this line only if you encounter issues with the vocabulary type
|
42 |
+
--outfile /path/to/Mistral-7B-cybersecurity-rules/ggml-model-f16.gguf
|
43 |
+
```
|
44 |
+
This command outputs a file that has been converted to FP16, which is an intermediary step before applying 8-bit quantization.
|
45 |
+
|
46 |
+
### Step 3: Quantize to 8-bits
|
47 |
+
Finally, apply 8-bit quantization to the FP16 model file. This step significantly reduces the model's memory footprint, making it suitable for deployment in resource-constrained environments:
|
48 |
+
```bash
|
49 |
+
quantize /path/to/Mistral-7B-cybersecurity-rules/ggml-model-f16.gguf \
|
50 |
+
/path/to/Mistral-7B-cybersecurity-rules/mistral-7b-rules-q8_0.gguf \
|
51 |
+
q8_0
|
52 |
+
```
|
53 |
+
Here, the `quantize` command converts the FP16 model into an 8-bit quantized model, further compressing the model while retaining its capability to perform its tasks effectively.
|
54 |
+
|
55 |
## License
|
56 |
This repository is licensed under the Apache License, Version 2.0. You can obtain a copy of the license at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
|
57 |
|