SoybeanMilk commited on
Commit
714130b
·
verified ·
1 Parent(s): b857500

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -0
README.md CHANGED
@@ -25,6 +25,39 @@ library_name: transformers
25
  <a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
26
  </div>
27
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
  ## 1. Introduction
29
 
30
  This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:
 
25
  <a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
26
  </div>
27
 
28
+
29
+ ## 0. Colab inference Notebook
30
+
31
+ # 4-bit Quantized MoE Model
32
+
33
+ Thank you for your interest in this 4-bit quantized Mixture of Experts (MoE) model!
34
+
35
+ ## Current Limitations
36
+
37
+ ⚠️ **Important Note**: As of recent testing, **vLLM does not yet support MoE models quantized with bitsandbytes (BNB) 4-bit**. This is a limitation on vLLM's side, not related to your setup or configuration.
38
+
39
+ ## Working Solution
40
+
41
+ I've prepared a comprehensive Colab notebook that demonstrates how to successfully load and run this model with full inference support using:
42
+
43
+ - **Standard transformers library**
44
+ - **bitsandbytes (BNB) 4-bit quantization**
45
+
46
+ ### 🚀 [View Colab Notebook](https://colab.research.google.com/drive/1WAebQWzWmHGVlL2mi3rukWpw1195W4AC?usp=sharing)
47
+
48
+ This notebook provides a reliable alternative for:
49
+ - Model deployment
50
+ - Testing and evaluation
51
+ - Inference demonstrations
52
+
53
+ ## Alternative Approach
54
+
55
+ While we wait for vLLM to add support for this specific combination, the provided Colab solution offers a stable and efficient way to work with the 4-bit quantized MoE model.
56
+
57
+ Feel free to use this as a reference implementation for your own projects or deployments.
58
+
59
+ ---
60
+
61
  ## 1. Introduction
62
 
63
  This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities: