leaderboard-pr-bot commited on
Commit
715fe20
·
verified ·
1 Parent(s): b1de043

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +114 -8
README.md CHANGED
@@ -1,30 +1,122 @@
1
  ---
2
  language:
3
  - en
 
4
  library_name: transformers
5
- pipeline_tag: text-generation
 
 
6
  datasets:
7
  - jondurbin/airoboros-2.2.1
8
  - Open-Orca/OpenOrca
9
  - garage-bAInd/Open-Platypus
10
  - ehartford/samantha-data
11
- tags:
12
- - llama-2
13
- - code
14
- license: llama2
15
  model-index:
16
  - name: SpeechlessCoder
17
  results:
18
  - task:
19
  type: text-generation
20
  dataset:
21
- type: openai_humaneval
22
  name: HumanEval
 
23
  metrics:
24
- - name: pass@1
25
- type: pass@1
26
  value: 34.146
 
27
  verified: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ---
29
 
30
  <p><h1> speechless-mistral-dolphin-orca-platypus-samantha-7b </h1></p>
@@ -124,3 +216,17 @@ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-le
124
  | Winogrande (5-shot) | 78.37 |
125
  | GSM8K (5-shot) | 21.38 |
126
  | DROP (3-shot) | 8.66 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
4
+ license: llama2
5
  library_name: transformers
6
+ tags:
7
+ - llama-2
8
+ - code
9
  datasets:
10
  - jondurbin/airoboros-2.2.1
11
  - Open-Orca/OpenOrca
12
  - garage-bAInd/Open-Platypus
13
  - ehartford/samantha-data
14
+ pipeline_tag: text-generation
 
 
 
15
  model-index:
16
  - name: SpeechlessCoder
17
  results:
18
  - task:
19
  type: text-generation
20
  dataset:
 
21
  name: HumanEval
22
+ type: openai_humaneval
23
  metrics:
24
+ - type: pass@1
 
25
  value: 34.146
26
+ name: pass@1
27
  verified: false
28
+ - task:
29
+ type: text-generation
30
+ name: Text Generation
31
+ dataset:
32
+ name: IFEval (0-Shot)
33
+ type: HuggingFaceH4/ifeval
34
+ args:
35
+ num_few_shot: 0
36
+ metrics:
37
+ - type: inst_level_strict_acc and prompt_level_strict_acc
38
+ value: 37.0
39
+ name: strict accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: BBH (3-Shot)
48
+ type: BBH
49
+ args:
50
+ num_few_shot: 3
51
+ metrics:
52
+ - type: acc_norm
53
+ value: 29.65
54
+ name: normalized accuracy
55
+ source:
56
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
57
+ name: Open LLM Leaderboard
58
+ - task:
59
+ type: text-generation
60
+ name: Text Generation
61
+ dataset:
62
+ name: MATH Lvl 5 (4-Shot)
63
+ type: hendrycks/competition_math
64
+ args:
65
+ num_few_shot: 4
66
+ metrics:
67
+ - type: exact_match
68
+ value: 2.95
69
+ name: exact match
70
+ source:
71
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
72
+ name: Open LLM Leaderboard
73
+ - task:
74
+ type: text-generation
75
+ name: Text Generation
76
+ dataset:
77
+ name: GPQA (0-shot)
78
+ type: Idavidrein/gpqa
79
+ args:
80
+ num_few_shot: 0
81
+ metrics:
82
+ - type: acc_norm
83
+ value: 4.47
84
+ name: acc_norm
85
+ source:
86
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
87
+ name: Open LLM Leaderboard
88
+ - task:
89
+ type: text-generation
90
+ name: Text Generation
91
+ dataset:
92
+ name: MuSR (0-shot)
93
+ type: TAUR-Lab/MuSR
94
+ args:
95
+ num_few_shot: 0
96
+ metrics:
97
+ - type: acc_norm
98
+ value: 13.85
99
+ name: acc_norm
100
+ source:
101
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
102
+ name: Open LLM Leaderboard
103
+ - task:
104
+ type: text-generation
105
+ name: Text Generation
106
+ dataset:
107
+ name: MMLU-PRO (5-shot)
108
+ type: TIGER-Lab/MMLU-Pro
109
+ config: main
110
+ split: test
111
+ args:
112
+ num_few_shot: 5
113
+ metrics:
114
+ - type: acc
115
+ value: 22.12
116
+ name: accuracy
117
+ source:
118
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=uukuguy/speechless-mistral-dolphin-orca-platypus-samantha-7b
119
+ name: Open LLM Leaderboard
120
  ---
121
 
122
  <p><h1> speechless-mistral-dolphin-orca-platypus-samantha-7b </h1></p>
 
216
  | Winogrande (5-shot) | 78.37 |
217
  | GSM8K (5-shot) | 21.38 |
218
  | DROP (3-shot) | 8.66 |
219
+
220
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
221
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_uukuguy__speechless-mistral-dolphin-orca-platypus-samantha-7b)
222
+
223
+ | Metric |Value|
224
+ |-------------------|----:|
225
+ |Avg. |18.34|
226
+ |IFEval (0-Shot) |37.00|
227
+ |BBH (3-Shot) |29.65|
228
+ |MATH Lvl 5 (4-Shot)| 2.95|
229
+ |GPQA (0-shot) | 4.47|
230
+ |MuSR (0-shot) |13.85|
231
+ |MMLU-PRO (5-shot) |22.12|
232
+