Update README.md
Browse files
README.md
CHANGED
@@ -25,6 +25,39 @@ library_name: transformers
|
|
25 |
<a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
|
26 |
</div>
|
27 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
## 1. Introduction
|
29 |
|
30 |
This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:
|
|
|
25 |
<a href="https://huggingface.co/spaces/moonshotai/Kimi-VL-A3B-Thinking">💬 <b>Chat Web</b></a>
|
26 |
</div>
|
27 |
|
28 |
+
|
29 |
+
## 0. Colab inference Notebook
|
30 |
+
|
31 |
+
# 4-bit Quantized MoE Model
|
32 |
+
|
33 |
+
Thank you for your interest in this 4-bit quantized Mixture of Experts (MoE) model!
|
34 |
+
|
35 |
+
## Current Limitations
|
36 |
+
|
37 |
+
⚠️ **Important Note**: As of recent testing, **vLLM does not yet support MoE models quantized with bitsandbytes (BNB) 4-bit**. This is a limitation on vLLM's side, not related to your setup or configuration.
|
38 |
+
|
39 |
+
## Working Solution
|
40 |
+
|
41 |
+
I've prepared a comprehensive Colab notebook that demonstrates how to successfully load and run this model with full inference support using:
|
42 |
+
|
43 |
+
- **Standard transformers library**
|
44 |
+
- **bitsandbytes (BNB) 4-bit quantization**
|
45 |
+
|
46 |
+
### 🚀 [View Colab Notebook](https://colab.research.google.com/drive/1WAebQWzWmHGVlL2mi3rukWpw1195W4AC?usp=sharing)
|
47 |
+
|
48 |
+
This notebook provides a reliable alternative for:
|
49 |
+
- Model deployment
|
50 |
+
- Testing and evaluation
|
51 |
+
- Inference demonstrations
|
52 |
+
|
53 |
+
## Alternative Approach
|
54 |
+
|
55 |
+
While we wait for vLLM to add support for this specific combination, the provided Colab solution offers a stable and efficient way to work with the 4-bit quantized MoE model.
|
56 |
+
|
57 |
+
Feel free to use this as a reference implementation for your own projects or deployments.
|
58 |
+
|
59 |
+
---
|
60 |
+
|
61 |
## 1. Introduction
|
62 |
|
63 |
This is an updated version of [Kimi-VL-A3B-Thinking](https://huggingface.co/moonshotai/Kimi-VL-A3B-Thinking), with following improved abilities:
|