Update README.md
Browse files
README.md
CHANGED
@@ -17,17 +17,21 @@ Technical details can be found in [our github repository](https://github.com/ana
|
|
17 |
This model likely inherits the ability to perform inference in TIR mode from the original model. However, all of our experiments were conducted in CoT mode, and its performance in TIR mode has not been evaluated.
|
18 |
|
19 |
## Evaluation
|
20 |
-
<img src='https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/
|
21 |
|
22 |
-
| | | AIME 2024 |
|
23 |
-
| ------------------- | ------------ | ---------------- |
|
24 |
-
| Model | Token budget | Pass@1 (avg. 64) |
|
25 |
-
| Qwen3-14B | 32000 |
|
26 |
-
| |
|
27 |
-
| |
|
28 |
-
|
|
29 |
-
| |
|
30 |
-
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
# Inference
|
33 |
## vLLM
|
|
|
17 |
This model likely inherits the ability to perform inference in TIR mode from the original model. However, all of our experiments were conducted in CoT mode, and its performance in TIR mode has not been evaluated.
|
18 |
|
19 |
## Evaluation
|
20 |
+
<img src='https://github.com/analokmaus/kaggle-aimo2-fast-math-r1/blob/master/assets/pass1_aime_all.png?raw=true' max-height='400px'>
|
21 |
|
22 |
+
| | | AIME 2024 | | AIME 2025 | |
|
23 |
+
| ------------------- | ------------ | ---------------- | ------------------ | ---------------- | ------------------ |
|
24 |
+
| Model | Token budget | Pass@1 (avg. 64) | Mean output tokens | Pass@1 (avg. 64) | Mean output tokens |
|
25 |
+
| Qwen3-14B | 32000 | 79.3 | 13669 | 69.5 | 16481 |
|
26 |
+
| | 24000 | 75.9 | 13168 | 65.6 | 15235 |
|
27 |
+
| | 16000 | 64.5 | 11351 | 50.4 | 12522 |
|
28 |
+
| | 12000 | 49.7 | 9746 | 36.3 | 10353 |
|
29 |
+
| | 8000 | 28.4 | 7374 | 19.5 | 7485 |
|
30 |
+
| Fast-Math-Qwen3-14B | 32000 | 77.6 | 9740 | 66.6 | 12281 |
|
31 |
+
| | 24000 | 76.5 | 9634 | 65.3 | 11847 |
|
32 |
+
| | 16000 | 72.6 | 8793 | 60.1 | 10195 |
|
33 |
+
| | 12000 | 65.1 | 7775 | 49.4 | 8733 |
|
34 |
+
| | 8000 | 50.7 | 6260 | 36 | 6618 |
|
35 |
|
36 |
# Inference
|
37 |
## vLLM
|