HPAI-BSC
/

Qwen2.5-7B-Instruct-Egida-DPO

Model card Files Files and versions Community

danihinjos commited on Mar 4

Commit

48d4f32

·

verified ·

1 Parent(s): bbe3a39

Update README.md

Files changed (1) hide show

README.md +12 -2

README.md CHANGED Viewed

@@ -31,14 +31,24 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Qwen-2.5-7B-Instruct         |     0.471      |  0.138   |    0.544     |    0.080    |
-| Qwen-2.5-7B-Egida-DPO        |     0.322      |  0.118   |    0.410     |    0.045    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Qwen-2.5-7B-Instruct         |         0.488         |      0.331      |
-| Qwen-2.5-7B-Egida-DPO        |         0.488         |      0.296      |
 ## Environmental Impact

 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Qwen-2.5-7B-Instruct         |     0.471      |  0.138   |    0.544     |    0.080    |
+| Qwen-2.5-7B-Instruct-Egida-DPO        |     0.322      |  0.118   |    0.410     |    0.045    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Qwen-2.5-7B-Instruct         |         0.488         |      0.331      |
+| Qwen-2.5-7B-Instruct-Egida-DPO        |         0.488         |      0.296      |
+### Refusal Ratio
+|                              | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
+|------------------------------|:---------------------:|:---------------:|
+| Qwen-2.5-7B-Instruct         |          0.021           |           0.175           |
+| Qwen-2.5-7B-Instruct-Egida-DPO        |          0.029           |           0.240           |
+Note that this refusal ratio is computed as keyword matching with a curated list of kewords. For more information, check the paper.
 ## Environmental Impact