teowu commited on
Commit
a7044da
Β·
verified Β·
1 Parent(s): c2e910a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -17
README.md CHANGED
@@ -37,36 +37,37 @@ This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moon
37
 
38
  ## 2. Performance
39
 
40
- Comparison with efficient models and two previous versions of Kimi-VL:
41
 
42
  <div align="center">
43
 
44
  | Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 |
45
  |----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------|
46
  | **General Multimodal** | | | | | | |
47
- | MMBench-EN-v1.1 (Acc) | 83.1 | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** |
48
- | RealWorldQA (Acc) | 75.4 | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** |
49
- | OCRBench (Acc) | 815 | 864 | 702 | 864 | 864 | **869** |
50
- | MMStar (Acc) | 64.7 | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** |
51
- | MMVet (Acc) | 69.1 | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** |
52
  | **Reasoning** | | | | | | |
53
- | MMMU (val, Pass@1) | 69.1 | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** |
54
- | MMMU-Pro (Pass@1) | 51.7 | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** |
55
  | **Math** | | | | | | |
56
- | MATH-Vision (Pass@1) | 30.4 | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** |
57
- | MathVista_MINI (Pass@1) | 63.8 | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** |
58
  | **Video** | | | | | | |
59
- | VideoMMMU (Pass@1) | 61.2 | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** |
60
- | MMVU (Pass@1) | 67.4 | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** |
61
- | Video-MME (w/ sub.) | 77.2 | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 |
62
  | **Agent Grounding** | | | | | | |
63
- | ScreenSpot-Pro (Acc) | 0.8 | 29.0 | β€” | 35.4 | β€” | **52.8** |
64
- | ScreenSpot-V2 (Acc) | 18.1 | 84.2 | β€” | **92.8** | β€” | 91.4 |
65
- | OSWorld-G (Acc) | - | 31.5 | β€” | 41.6 | β€” | **52.5** |
66
  | **Long Document** | | | | | | |
67
- | MMLongBench-DOC (Acc) | 42.8 | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
68
  </div>
69
 
 
70
  Comparison with 30B-70B open-source models:
71
 
72
  <div align="center">
 
37
 
38
  ## 2. Performance
39
 
40
+ Comparison with efficient models and two previous versions of Kimi-VL (*Results of GPT-4o is for reference here, and shown in <i>italics</i>):
41
 
42
  <div align="center">
43
 
44
  | Benchmark (Metric) | GPT-4o | Qwen2.5-VL-7B | Gemma3-12B-IT | Kimi-VL-A3B-Instruct | Kimi-VL-A3B-Thinking | Kimi-VL-A3B-Thinking-2506 |
45
  |----------------------------|--------|---------------|---------------|----------------------|----------------------|--------------------------|
46
  | **General Multimodal** | | | | | | |
47
+ | MMBench-EN-v1.1 (Acc) | *83.1* | 83.2 | 74.6 | 82.9 | 76.0 | **84.4** |
48
+ | RealWorldQA (Acc) | *75.4* | 68.5 | 59.1 | 68.1 | 64.0 | **70.0** |
49
+ | OCRBench (Acc) | *815* | 864 | 702 | 864 | 864 | **869** |
50
+ | MMStar (Acc) | *64.7* | 63.0 | 56.1 | 61.7 | 64.2 | **70.4** |
51
+ | MMVet (Acc) | *69.1* | 67.1 | 64.9 | 66.7 | 69.5 | **78.1** |
52
  | **Reasoning** | | | | | | |
53
+ | MMMU (val, Pass@1) | *69.1* | 58.6 | 59.6 | 57.0 | 61.7 | **64.0** |
54
+ | MMMU-Pro (Pass@1) | *51.7* | 38.1 | 32.1 | 36.0 | 43.2 | **46.3** |
55
  | **Math** | | | | | | |
56
+ | MATH-Vision (Pass@1) | *30.4* | 25.0 | 32.1 | 21.7 | 36.8 | **56.9** |
57
+ | MathVista_MINI (Pass@1) | *63.8* | 68.0 | 56.1 | 68.6 | 71.7 | **80.1** |
58
  | **Video** | | | | | | |
59
+ | VideoMMMU (Pass@1) | *61.2* | 47.4 | 57.0 | 52.1 | 55.5 | **65.2** |
60
+ | MMVU (Pass@1) | *67.4* | 50.1 | 57.0 | 52.7 | 53.0 | **57.5** |
61
+ | Video-MME (w/ sub.) | *77.2* | 71.6 | 62.1 | **72.7** | 66.0 | 71.9 |
62
  | **Agent Grounding** | | | | | | |
63
+ | ScreenSpot-Pro (Acc) | *0.8* | 29.0 | β€” | 35.4 | β€” | **52.8** |
64
+ | ScreenSpot-V2 (Acc) | *18.1* | 84.2 | β€” | **92.8** | β€” | 91.4 |
65
+ | OSWorld-G (Acc) | - | *31.5* | β€” | 41.6 | β€” | **52.5** |
66
  | **Long Document** | | | | | | |
67
+ | MMLongBench-DOC (Acc) | *42.8* | 29.6 | 21.3 | 35.1 | 32.5 | **42.1** |
68
  </div>
69
 
70
+
71
  Comparison with 30B-70B open-source models:
72
 
73
  <div align="center">