sthenno commited on
Commit
7243178
·
verified ·
1 Parent(s): 137a399

Adding Evaluation Results (#4)

Browse files

- Adding Evaluation Results (617907ee2585c00db2267f640e84467a5af308ca)

Files changed (1) hide show
  1. README.md +20 -13
README.md CHANGED
@@ -28,8 +28,7 @@ model-index:
28
  value: 78.78
29
  name: strict accuracy
30
  source:
31
- url: >-
32
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
33
  name: Open LLM Leaderboard
34
  - task:
35
  type: text-generation
@@ -44,8 +43,7 @@ model-index:
44
  value: 50.91
45
  name: normalized accuracy
46
  source:
47
- url: >-
48
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
49
  name: Open LLM Leaderboard
50
  - task:
51
  type: text-generation
@@ -60,8 +58,7 @@ model-index:
60
  value: 31.57
61
  name: exact match
62
  source:
63
- url: >-
64
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
65
  name: Open LLM Leaderboard
66
  - task:
67
  type: text-generation
@@ -76,8 +73,7 @@ model-index:
76
  value: 17
77
  name: acc_norm
78
  source:
79
- url: >-
80
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
81
  name: Open LLM Leaderboard
82
  - task:
83
  type: text-generation
@@ -92,8 +88,7 @@ model-index:
92
  value: 14.77
93
  name: acc_norm
94
  source:
95
- url: >-
96
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
97
  name: Open LLM Leaderboard
98
  - task:
99
  type: text-generation
@@ -110,8 +105,7 @@ model-index:
110
  value: 47.46
111
  name: accuracy
112
  source:
113
- url: >-
114
- https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
115
  name: Open LLM Leaderboard
116
  ---
117
 
@@ -190,4 +184,17 @@ As of **December 25, 2024**, this should be the **best-performing 14B model** in
190
  |MATH Lvl 5 (4-Shot)|31.57|
191
  |GPQA (0-shot) |17.00|
192
  |MuSR (0-shot) |14.77|
193
- |MMLU-PRO (5-shot) |47.46|
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  value: 78.78
29
  name: strict accuracy
30
  source:
31
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
32
  name: Open LLM Leaderboard
33
  - task:
34
  type: text-generation
 
43
  value: 50.91
44
  name: normalized accuracy
45
  source:
46
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
47
  name: Open LLM Leaderboard
48
  - task:
49
  type: text-generation
 
58
  value: 31.57
59
  name: exact match
60
  source:
61
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
62
  name: Open LLM Leaderboard
63
  - task:
64
  type: text-generation
 
73
  value: 17
74
  name: acc_norm
75
  source:
76
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
77
  name: Open LLM Leaderboard
78
  - task:
79
  type: text-generation
 
88
  value: 14.77
89
  name: acc_norm
90
  source:
91
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
92
  name: Open LLM Leaderboard
93
  - task:
94
  type: text-generation
 
105
  value: 47.46
106
  name: accuracy
107
  source:
108
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sthenno-com/miscii-14b-1225
 
109
  name: Open LLM Leaderboard
110
  ---
111
 
 
184
  |MATH Lvl 5 (4-Shot)|31.57|
185
  |GPQA (0-shot) |17.00|
186
  |MuSR (0-shot) |14.77|
187
+ |MMLU-PRO (5-shot) |47.46|
188
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
189
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/sthenno-com__miscii-14b-1225-details)
190
+
191
+ | Metric |Value|
192
+ |-------------------|----:|
193
+ |Avg. |42.35|
194
+ |IFEval (0-Shot) |78.78|
195
+ |BBH (3-Shot) |50.91|
196
+ |MATH Lvl 5 (4-Shot)|45.17|
197
+ |GPQA (0-shot) |17.00|
198
+ |MuSR (0-shot) |14.77|
199
+ |MMLU-PRO (5-shot) |47.46|
200
+