Add Hugging Face Papers link and base model

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -1,9 +1,9 @@
1
  ---
2
  base_model:
3
  - moonshotai/Kimi-VL-A3B-Instruct
 
4
  license: mit
5
  pipeline_tag: image-text-to-text
6
- library_name: transformers
7
  ---
8
 
9
  <div align="center">
@@ -11,7 +11,7 @@ library_name: transformers
11
  </div>
12
 
13
  <div align="center">
14
- <a href="https://arxiv.org/abs/2504.07491">
15
  <b>๐Ÿ“„ Tech Report</b>
16
  </a> &nbsp;|&nbsp;
17
  <a href="https://github.com/MoonshotAI/Kimi-VL">
@@ -34,7 +34,7 @@ Kimi-VL also advances the pareto frontiers of multimodal models in processing lo
34
 
35
  Building on this foundation, we introduce an advanced long-thinking variant: **Kimi-VL-Thinking**. Developed through long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL), this model exhibits strong long-horizon reasoning capabilities. It achieves scores of 61.7 on MMMU, 36.8 on MathVision, and 71.3 on MathVista while maintaining the compact 2.8B activated LLM parameter footprint, setting a new standard for efficient yet capable multimodal **thinking** models.
36
 
37
- More information can be found in our technical report: [Kimi-VL Technical Report](https://arxiv.org/abs/2504.07491).
38
 
39
  ## 2. Architecture
40
 
@@ -62,8 +62,6 @@ The model adopts an MoE language model, a native-resolution visual encoder (Moon
62
  > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
63
  > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
64
 
65
-
66
-
67
  ## 4. Performance
68
 
69
  With effective long-thinking abilitites, Kimi-VL-A3B-Thinking can match the performance of 30B/70B frontier open-source VLMs on MathVision benchmark:
 
1
  ---
2
  base_model:
3
  - moonshotai/Kimi-VL-A3B-Instruct
4
+ library_name: transformers
5
  license: mit
6
  pipeline_tag: image-text-to-text
 
7
  ---
8
 
9
  <div align="center">
 
11
  </div>
12
 
13
  <div align="center">
14
+ <a href="https://huggingface.co/papers/2504.07491">
15
  <b>๐Ÿ“„ Tech Report</b>
16
  </a> &nbsp;|&nbsp;
17
  <a href="https://github.com/MoonshotAI/Kimi-VL">
 
34
 
35
  Building on this foundation, we introduce an advanced long-thinking variant: **Kimi-VL-Thinking**. Developed through long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL), this model exhibits strong long-horizon reasoning capabilities. It achieves scores of 61.7 on MMMU, 36.8 on MathVision, and 71.3 on MathVista while maintaining the compact 2.8B activated LLM parameter footprint, setting a new standard for efficient yet capable multimodal **thinking** models.
36
 
37
+ More information can be found in our technical report: [Kimi-VL Technical Report](https://huggingface.co/papers/2504.07491).
38
 
39
  ## 2. Architecture
40
 
 
62
  > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
63
  > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
64
 
 
 
65
  ## 4. Performance
66
 
67
  With effective long-thinking abilitites, Kimi-VL-A3B-Thinking can match the performance of 30B/70B frontier open-source VLMs on MathVision benchmark: