Update README.md
Browse files
README.md
CHANGED
@@ -37,36 +37,37 @@ This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moon
|
|
37 |
|
38 |
## 2. Performance
|
39 |
|
40 |
-
Comparison with efficient models and two previous versions of Kimi-VL:
|
41 |
|
42 |
<div align="center">
|
43 |
|
44 |
| Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 |
|
45 |
|----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------|
|
46 |
| **General Multimodal** | | | | | | |
|
47 |
-
| MMBench-EN-v1.1 (Acc) | 83.1 | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** |
|
48 |
-
| RealWorldQA (Acc) | 75.4 | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** |
|
49 |
-
| OCRBench (Acc) | 815 | 864 | 702 | 864 | 864 | **869** |
|
50 |
-
| MMStar (Acc) | 64.7 | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** |
|
51 |
-
| MMVet (Acc) | 69.1 | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** |
|
52 |
| **Reasoning** | | | | | | |
|
53 |
-
| MMMU (val, Pass@1) | 69.1 | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** |
|
54 |
-
| MMMU-Pro (Pass@1) | 51.7 | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** |
|
55 |
| **Math** | | | | | | |
|
56 |
-
| MATH-Vision (Pass@1) | 30.4 | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** |
|
57 |
-
| MathVista_MINI (Pass@1) | 63.8 | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** |
|
58 |
| **Video** | | | | | | |
|
59 |
-
| VideoMMMU (Pass@1) | 61.2 | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** |
|
60 |
-
| MMVU (Pass@1) | 67.4 | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** |
|
61 |
-
| Video-MME (w/ sub.) | 77.2 | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 |
|
62 |
| **Agent Grounding** | | | | | | |
|
63 |
-
| ScreenSpot-Pro (Acc) | 0.8 | 29.0 | β | 35.4 | β | **52.8** |
|
64 |
-
| ScreenSpot-V2 (Acc) | 18.1 | 84.2 | β | **92.8** | β | 91.4 |
|
65 |
-
| OSWorld-G (Acc) | - | 31.5 | β | 41.6 | β | **52.5** |
|
66 |
| **Long Document** | | | | | | |
|
67 |
-
| MMLongBench-DOC (Acc) | 42.8 | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
|
68 |
</div>
|
69 |
|
|
|
70 |
Comparison with 30B-70B open-source models:
|
71 |
|
72 |
<div align="center">
|
|
|
37 |
|
38 |
## 2. Performance
|
39 |
|
40 |
+
Comparison with efficient models and two previous versions of Kimi-VL (*Results of GPT-4o is for reference here, and shown in <i>italics</i>):
|
41 |
|
42 |
<div align="center">
|
43 |
|
44 |
| Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 |
|
45 |
|----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------|
|
46 |
| **General Multimodal** | | | | | | |
|
47 |
+
| MMBench-EN-v1.1 (Acc) | *83.1* | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** |
|
48 |
+
| RealWorldQA (Acc) | *75.4* | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** |
|
49 |
+
| OCRBench (Acc) | *815* | 864 | 702 | 864 | 864 | **869** |
|
50 |
+
| MMStar (Acc) | *64.7* | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** |
|
51 |
+
| MMVet (Acc) | *69.1* | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** |
|
52 |
| **Reasoning** | | | | | | |
|
53 |
+
| MMMU (val, Pass@1) | *69.1* | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** |
|
54 |
+
| MMMU-Pro (Pass@1) | *51.7* | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** |
|
55 |
| **Math** | | | | | | |
|
56 |
+
| MATH-Vision (Pass@1) | *30.4* | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** |
|
57 |
+
| MathVista_MINI (Pass@1) | *63.8* | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** |
|
58 |
| **Video** | | | | | | |
|
59 |
+
| VideoMMMU (Pass@1) | *61.2* | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** |
|
60 |
+
| MMVU (Pass@1) | *67.4* | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** |
|
61 |
+
| Video-MME (w/ sub.) | *77.2* | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 |
|
62 |
| **Agent Grounding** | | | | | | |
|
63 |
+
| ScreenSpot-Pro (Acc) | *0.8* | 29.0 | β | 35.4 | β | **52.8** |
|
64 |
+
| ScreenSpot-V2 (Acc) | *18.1* | 84.2 | β | **92.8** | β | 91.4 |
|
65 |
+
| OSWorld-G (Acc) | - | *31.5* | β | 41.6 | β | **52.5** |
|
66 |
| **Long Document** | | | | | | |
|
67 |
+
| MMLongBench-DOC (Acc) | *42.8* | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
|
68 |
</div>
|
69 |
|
70 |
+
|
71 |
Comparison with 30B-70B open-source models:
|
72 |
|
73 |
<div align="center">
|