Added thinking ablation evaluation results
#3
by
ranarag
- opened
README.md
CHANGED
|
@@ -166,7 +166,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
| 166 |
|
| 167 |
**Evaluation Results:**
|
| 168 |
<table>
|
| 169 |
-
|
| 170 |
<thead>
|
| 171 |
<tr>
|
| 172 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
|
@@ -300,7 +300,7 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
| 300 |
|
| 301 |
<tr>
|
| 302 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
| 303 |
-
<td style="text-align:center; background-color: #DAE8FF; color: black;">
|
| 304 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
| 305 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
| 306 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
|
@@ -315,7 +315,51 @@ So, you need to add 10 liters of a 70% acid solution to the initial 10-liter 30%
|
|
| 315 |
</tr>
|
| 316 |
|
| 317 |
|
| 318 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 319 |
|
| 320 |
|
| 321 |
</tbody></table>
|
|
|
|
| 166 |
|
| 167 |
**Evaluation Results:**
|
| 168 |
<table>
|
| 169 |
+
<caption><b>Comparison with Other Models</b></caption>
|
| 170 |
<thead>
|
| 171 |
<tr>
|
| 172 |
<th style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
|
|
|
| 300 |
|
| 301 |
<tr>
|
| 302 |
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
| 303 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
|
| 304 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
| 305 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">57.18</td>
|
| 306 |
<td style="text-align:center; background-color: #DAE8FF; color: black;">20.56</td>
|
|
|
|
| 315 |
</tr>
|
| 316 |
|
| 317 |
|
| 318 |
+
<table>
|
| 319 |
+
<caption><b>Thinking Ablation</b></caption>
|
| 320 |
+
<thead>
|
| 321 |
+
<tr>
|
| 322 |
+
<th rowspan="2" style="text-align:left; background-color: #001d6c; color: white;">Models</th>
|
| 323 |
+
<th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=False</th>
|
| 324 |
+
<th colspan="2" style="text-align:center; background-color: #001d6c; color: white;">Thinking=True</th>
|
| 325 |
+
</tr>
|
| 326 |
+
<tr>
|
| 327 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
| 328 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
| 329 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">ArenaHard</th>
|
| 330 |
+
<th style="text-align:center; background-color: #001d6c; color: white;">Alpaca-Eval-2</th>
|
| 331 |
+
</tr></thead>
|
| 332 |
+
<tbody>
|
| 333 |
+
<tr>
|
| 334 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-8B-Instruct</td>
|
| 335 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">37.58</td>
|
| 336 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.34</td>
|
| 337 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
| 338 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
| 339 |
+
</tr>
|
| 340 |
+
<tr>
|
| 341 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.1-2B-Instruct</td>
|
| 342 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">23.3</td>
|
| 343 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">27.17</td>
|
| 344 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
| 345 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">-</td>
|
| 346 |
+
</tr>
|
| 347 |
+
<tr>
|
| 348 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;">Granite-3.2-8B-Instruct</td>
|
| 349 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">40.54</td>
|
| 350 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">36.89</td>
|
| 351 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">55.25</td>
|
| 352 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">61.19</td>
|
| 353 |
+
</tr>
|
| 354 |
+
<tr>
|
| 355 |
+
<td style="text-align:left; background-color: #DAE8FF; color: black;"><b>Granite-3.2-2B-Instruct</b></td>
|
| 356 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">30.42</td>
|
| 357 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">31.65</td>
|
| 358 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">26.6</td>
|
| 359 |
+
<td style="text-align:center; background-color: #DAE8FF; color: black;">34.51</td>
|
| 360 |
+
</tr>
|
| 361 |
+
</tbody>
|
| 362 |
+
</table>
|
| 363 |
|
| 364 |
|
| 365 |
</tbody></table>
|