Update README.md
Browse files
README.md
CHANGED
@@ -141,25 +141,24 @@ print(make_table(results))
|
|
141 |
|
142 |
| Benchmark | | |
|
143 |
|----------------------------------|-------------|-------------------|
|
144 |
-
| | Phi-4 mini-Ins | phi4-mini-8dq4w
|
145 |
| **Popular aggregated benchmark** | | |
|
146 |
-
| mmlu
|
147 |
-
| mmlu_pro
|
148 |
| **Reasoning** | | |
|
149 |
-
| arc_challenge |
|
150 |
-
|
|
151 |
-
| hellaswag | 54.57
|
152 |
-
| openbookqa |
|
153 |
-
| piqa
|
154 |
-
| siqa |
|
155 |
-
|
|
156 |
-
| winogrande
|
157 |
| **Multilingual** | | |
|
158 |
-
|
|
159 |
-
| mgsm_cot_native | TODO | TODO |
|
160 |
| **Math** | | |
|
161 |
-
| gsm8k
|
162 |
-
| Mathqa
|
163 |
| **Overall** | **TODO** | **TODO** |
|
164 |
|
165 |
|
|
|
141 |
|
142 |
| Benchmark | | |
|
143 |
|----------------------------------|-------------|-------------------|
|
144 |
+
| | Phi-4 mini-Ins | phi4-mini-8dq4w|
|
145 |
| **Popular aggregated benchmark** | | |
|
146 |
+
| mmlu (0 shot) | 66.73 | 63.11 |
|
147 |
+
| mmlu_pro (5-shot) | 44.71 | 35.31 |
|
148 |
| **Reasoning** | | |
|
149 |
+
| arc_challenge | 56.91 | 55.12 |
|
150 |
+
| gpqa_main_zeroshot | 30.13 | 29.02 |
|
151 |
+
| hellaswag | 54.57 | 53.23 |
|
152 |
+
| openbookqa | 33.00 | 32.40 |
|
153 |
+
| piqa (0-shot) | 77.64 | 76.66 |
|
154 |
+
| siqa | 49.59 | 47.08 |
|
155 |
+
| truthfulqa_mc2 (0-shot) | 48.39 | 47.99 |
|
156 |
+
| winogrande (0-shot) | 71.11 | 70.17 |
|
157 |
| **Multilingual** | | |
|
158 |
+
| mgsm_en_cot_en | 60.8? | 0.620 |
|
|
|
159 |
| **Math** | | |
|
160 |
+
| gsm8k (5-shot) | 81.88 | 70.43 |
|
161 |
+
| Mathqa (0-shot) | 42.31 | 41.57 |
|
162 |
| **Overall** | **TODO** | **TODO** |
|
163 |
|
164 |
|