blobbybob
/

Kimi-VL-A3B-Thinking-GGUF

Image-Text-to-Text

Transformers

Model card Files Files and versions Community

Add Hugging Face Papers link and base model

by nielsr HF Staff - opened Apr 16

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-5

Files changed (1) hide show

README.md +3 -5

README.md CHANGED Viewed

@@ -1,9 +1,9 @@
 ---
 base_model:
 - moonshotai/Kimi-VL-A3B-Instruct
 license: mit
 pipeline_tag: image-text-to-text
-library_name: transformers
 ---
 <div align="center">
@@ -11,7 +11,7 @@ library_name: transformers
 </div>
 <div align="center">
-  <a href="https://arxiv.org/abs/2504.07491">
     <b>📄 Tech Report</b>
   </a> &nbsp;|&nbsp;
   <a href="https://github.com/MoonshotAI/Kimi-VL">
@@ -34,7 +34,7 @@ Kimi-VL also advances the pareto frontiers of multimodal models in processing lo
 Building on this foundation, we introduce an advanced long-thinking variant: **Kimi-VL-Thinking**. Developed through long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL), this model exhibits strong long-horizon reasoning capabilities. It achieves scores of 61.7 on MMMU, 36.8 on MathVision, and 71.3 on MathVista while maintaining the compact 2.8B activated LLM parameter footprint, setting a new standard for efficient yet capable multimodal **thinking** models.
-More information can be found in our technical report: [Kimi-VL Technical Report](https://arxiv.org/abs/2504.07491).
 ## 2. Architecture
@@ -62,8 +62,6 @@ The model adopts an MoE language model, a native-resolution visual encoder (Moon
 > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
 > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
 ## 4. Performance
 With effective long-thinking abilitites, Kimi-VL-A3B-Thinking can match the performance of 30B/70B frontier open-source VLMs on MathVision benchmark:

 ---
 base_model:
 - moonshotai/Kimi-VL-A3B-Instruct
+library_name: transformers
 license: mit
 pipeline_tag: image-text-to-text
 ---
 <div align="center">
 </div>
 <div align="center">
+  <a href="https://huggingface.co/papers/2504.07491">
     <b>📄 Tech Report</b>
   </a> &nbsp;|&nbsp;
   <a href="https://github.com/MoonshotAI/Kimi-VL">
 Building on this foundation, we introduce an advanced long-thinking variant: **Kimi-VL-Thinking**. Developed through long chain-of-thought (CoT) supervised fine-tuning (SFT) and reinforcement learning (RL), this model exhibits strong long-horizon reasoning capabilities. It achieves scores of 61.7 on MMMU, 36.8 on MathVision, and 71.3 on MathVista while maintaining the compact 2.8B activated LLM parameter footprint, setting a new standard for efficient yet capable multimodal **thinking** models.
+More information can be found in our technical report: [Kimi-VL Technical Report](https://huggingface.co/papers/2504.07491).
 ## 2. Architecture
 > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
 > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
 ## 4. Performance
 With effective long-thinking abilitites, Kimi-VL-A3B-Thinking can match the performance of 30B/70B frontier open-source VLMs on MathVision benchmark: