Safetensors
English
qwen2
safety
danihinjos commited on
Commit
48d4f32
Β·
verified Β·
1 Parent(s): bbe3a39

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -31,14 +31,24 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
31
  | | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
32
  |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
33
  | Qwen-2.5-7B-Instruct | 0.471 | 0.138 | 0.544 | 0.080 |
34
- | Qwen-2.5-7B-Egida-DPO | 0.322 | 0.118 | 0.410 | 0.045 |
35
 
36
  ### General Purpose Performance
37
 
38
  | | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
39
  |------------------------------|:---------------------:|:---------------:|
40
  | Qwen-2.5-7B-Instruct | 0.488 | 0.331 |
41
- | Qwen-2.5-7B-Egida-DPO | 0.488 | 0.296 |
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Environmental Impact
44
 
 
31
  | | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
32
  |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
33
  | Qwen-2.5-7B-Instruct | 0.471 | 0.138 | 0.544 | 0.080 |
34
+ | Qwen-2.5-7B-Instruct-Egida-DPO | 0.322 | 0.118 | 0.410 | 0.045 |
35
 
36
  ### General Purpose Performance
37
 
38
  | | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
39
  |------------------------------|:---------------------:|:---------------:|
40
  | Qwen-2.5-7B-Instruct | 0.488 | 0.331 |
41
+ | Qwen-2.5-7B-Instruct-Egida-DPO | 0.488 | 0.296 |
42
+
43
+ ### Refusal Ratio
44
+
45
+ | | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
46
+ |------------------------------|:---------------------:|:---------------:|
47
+ | Qwen-2.5-7B-Instruct | 0.021 | 0.175 |
48
+ | Qwen-2.5-7B-Instruct-Egida-DPO | 0.029 | 0.240 |
49
+
50
+ Note that this refusal ratio is computed as keyword matching with a curated list of kewords. For more information, check the paper.
51
+
52
 
53
  ## Environmental Impact
54