jcordon5 commited on
Commit
98bf1ea
·
verified ·
1 Parent(s): 3be10b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -2,16 +2,56 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # Fine-Tuned Cybersecurity Threat Detection Model
6
 
7
  This model is a fine-tune of [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), via Knowledge Distillation of [0dAI-7.5B](https://huggingface.co/0dAI/0dAI-7.5B-v2).
8
  The fine-tuning was conducted using a curated corpus of 950 cybersecurity rules from SIGMA, YARA, and Suricata repositories for threat and intrusion detection.
9
 
 
 
 
 
 
10
  ## Key Features
11
  - Fine-tuned on a corpus of cybersecurity threat and intrusion detection rules.
12
  - Expert in generating YARA, Suricata, and SIGMA rules.
13
  - Based on Mistral-7B-Instruct-v0.2, with a 32K context window.
14
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  ## License
16
  This repository is licensed under the Apache License, Version 2.0. You can obtain a copy of the license at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
17
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # Fine-Tuned model for threat and intrusion detection rules generation
6
 
7
  This model is a fine-tune of [Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), via Knowledge Distillation of [0dAI-7.5B](https://huggingface.co/0dAI/0dAI-7.5B-v2).
8
  The fine-tuning was conducted using a curated corpus of 950 cybersecurity rules from SIGMA, YARA, and Suricata repositories for threat and intrusion detection.
9
 
10
+ Instruct the model to craft a SIGMA rule for detecting potentially malicious commands such as `msfvenom` and `netcat` in Audit system logs, or a Suricata rule to spot SSH brute-force attacks, or even a YARA rule to identify obfuscated strings in files — and watch the magic happen! Automate the creation of rules in your cybersecurity systems with this model.
11
+
12
+ For an in-depth understanding of how this model has been fine-tuned, refer to the associated paper here: [link to the paper].
13
+
14
+
15
  ## Key Features
16
  - Fine-tuned on a corpus of cybersecurity threat and intrusion detection rules.
17
  - Expert in generating YARA, Suricata, and SIGMA rules.
18
  - Based on Mistral-7B-Instruct-v0.2, with a 32K context window.
19
 
20
+
21
+ ## Quantization
22
+ You can easily quantize your model for local use on your computer with the help of the `llama.cpp` or `ollama` libraries. This process converts your model into a format that is optimized for performance, particularly useful for deployment on devices with limited computational resources.
23
+
24
+ To perform this quantization using the `llama.cpp` library ([link to llama.cpp](https://github.com/ggerganov/llama.cpp)), follow the steps below:
25
+
26
+ ### Step 1: Convert Vocabulary
27
+ First, convert your model's vocabulary to a format suitable for quantization. Use the following command, replacing `/path/to/` with the actual path to your model files:
28
+ ```bash
29
+ python convert.py /path/to/Mistral-7B-cybersecurity-rules \
30
+ --vocab-only \
31
+ --outfile /path/to/Mistral-7B-cybersecurity-rules/tokenizer.model \
32
+ --vocab-type bpe
33
+ ```
34
+ This command extracts and converts the vocabulary using the byte pair encoding (BPE) method, saving it to a new file.
35
+
36
+ ### Step 2: Prepare Model for Quantization
37
+ Next, prepare the model for quantization by converting it to a half-precision floating-point format (FP16). This step reduces the model size and prepares it for the final quantization to 8-bit integers. Execute the following command:
38
+ ```bash
39
+ python convert.py \
40
+ --outtype f16 \
41
+ --vocab-type bpe \ # Add this line only if you encounter issues with the vocabulary type
42
+ --outfile /path/to/Mistral-7B-cybersecurity-rules/ggml-model-f16.gguf
43
+ ```
44
+ This command outputs a file that has been converted to FP16, which is an intermediary step before applying 8-bit quantization.
45
+
46
+ ### Step 3: Quantize to 8-bits
47
+ Finally, apply 8-bit quantization to the FP16 model file. This step significantly reduces the model's memory footprint, making it suitable for deployment in resource-constrained environments:
48
+ ```bash
49
+ quantize /path/to/Mistral-7B-cybersecurity-rules/ggml-model-f16.gguf \
50
+ /path/to/Mistral-7B-cybersecurity-rules/mistral-7b-rules-q8_0.gguf \
51
+ q8_0
52
+ ```
53
+ Here, the `quantize` command converts the FP16 model into an 8-bit quantized model, further compressing the model while retaining its capability to perform its tasks effectively.
54
+
55
  ## License
56
  This repository is licensed under the Apache License, Version 2.0. You can obtain a copy of the license at [Apache License 2.0](http://www.apache.org/licenses/LICENSE-2.0).
57