vijilpd commited on
Commit
8e61299
·
verified ·
1 Parent(s): d2aa711

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +106 -3
README.md CHANGED
@@ -1,3 +1,106 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - injection
5
+ - security
6
+ - llm
7
+ - prompt-injection
8
+ ---
9
+
10
+ # Model Card for Vijil Prompt Injection
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ This model is a fine-tuned version of ModernBert to classify prompt-injection prompts which can manipulate language models into producing unintended outputs.
17
+
18
+ - **Developed by:** Vijil AI
19
+ - **License:** apache-2.0
20
+ - **Finetuned version of [ModernBERT](https://huggingface.co/docs/transformers/en/model_doc/modernbert)**
21
+
22
+ ## Uses
23
+
24
+ Prompt injection attacks manipulate language models by inserting or altering prompts to trigger harmful or unintended responses.
25
+ The vijil/mbert-prompt-injection model is designed to enhance security in language model applications by detecting prompt-injection attacks.
26
+
27
+ ## How to Get Started with the Model
28
+
29
+ ```
30
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
31
+ import torch
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained("answerdotai/ModernBERT-base")
34
+ model = AutoModelForSequenceClassification.from_pretrained("vijil/mbert-prompt-injection")
35
+
36
+ classifier = pipeline(
37
+ "text-classification",
38
+ model=model,
39
+ tokenizer=tokenizer,
40
+ truncation=True,
41
+ max_length=512,
42
+ device=torch.device("cuda" if torch.cuda.is_available() else "cpu"),
43
+ )
44
+
45
+ print(classifier("this is a prompt-injection prompt"))
46
+
47
+ ```
48
+
49
+ ## Training Details
50
+
51
+ ### Training Data
52
+
53
+ The dataset used for training the model was taken from
54
+
55
+ [wildguardmix/train](https://huggingface.co/datasets/allenai/wildguardmix)
56
+ and
57
+ [safe-guard-prompt-injection/train](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)
58
+
59
+ ### Training Procedure
60
+
61
+ Supervised finetuning with above dataset
62
+
63
+ #### Training Hyperparameters
64
+
65
+ * learning_rate: 5e-05
66
+
67
+ * train_batch_size: 32
68
+
69
+ * eval_batch_size: 32
70
+
71
+ * optimizer: adamw_torch_fused
72
+
73
+ * lr_scheduler_type: cosine_with_restarts
74
+
75
+ * warmup_ratio: 0.1
76
+
77
+ * num_epochs: 3
78
+
79
+ ## Evaluation
80
+
81
+ * Training Loss: 0.0036
82
+
83
+ * Validation Loss: 0.209392
84
+
85
+ * Accuracy: 0.961538
86
+
87
+ * Precision: 0.958362
88
+
89
+ * Recall: 0.957055
90
+
91
+ * Fl: 0.957708
92
+
93
+ #### Testing Data
94
+
95
+ The dataset used for training the model was taken from
96
+
97
+ [wildguardmix/test](https://huggingface.co/datasets/allenai/wildguardmix)
98
+ and
99
+ [safe-guard-prompt-injection/test](https://huggingface.co/datasets/xTRam1/safe-guard-prompt-injection)
100
+
101
+ ### Results
102
+
103
+
104
+
105
+ ## Model Card Contact
106
+ https://vijil.ai