update README
Browse files
README.md
CHANGED
@@ -8,17 +8,22 @@ base_model_relation: quantized
|
|
8 |
|
9 |
## `ik_llma.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324 by deepseek-ai
|
10 |
|
11 |
-
This collection
|
|
|
|
|
12 |
|
13 |
## TODO
|
14 |
|
15 |
-
- [ ] Upload imatrix.dat with MLA tensors
|
16 |
-
- [ ] Upload my fav
|
17 |
|
18 |
## Big Thanks
|
|
|
|
|
|
|
19 |
|
20 |
-
|
21 |
|
22 |
## References
|
23 |
* [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/)
|
24 |
-
* [ik_llama.cpp Getting Started Guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
|
|
|
8 |
|
9 |
## `ik_llma.cpp` imatrix MLA Quantizations of DeepSeek-V3-0324 by deepseek-ai
|
10 |
|
11 |
+
This quant collection is intended for use with [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/) fork.
|
12 |
+
|
13 |
+
All these quants support MLA to allow 32k (some even 64k) context length in under 24GB GPU VRAM for `R1` and `V3` while offloading MoE layers to CPU RAM.
|
14 |
|
15 |
## TODO
|
16 |
|
17 |
+
- [ ] Upload imatrix.dat computed with MLA tensors
|
18 |
+
- [ ] Upload my fav SOTA quants available on `ik_llama.cpp` for hybrid GPU+CPU and pure CPU optimized inferencing.
|
19 |
|
20 |
## Big Thanks
|
21 |
+
Big thanks to all the folks in the quanting and inferencing community here and on `r/LocalLLaMA` for sharing tips and tricks to help each other access all the fun new models!
|
22 |
+
|
23 |
+
Shout out to the **Level1Techs** crew, community [Forums](https://forum.level1techs.com/t/deepseek-deep-dive-r1-at-home/225826), [YouTube Channel](https://www.youtube.com/@Level1Techs), and for providing big hardware expertise and access to run these experiments!!!
|
24 |
|
25 |
+
Finally, I'm still learning the ropes, so please be patient and we can learn together. Thanks!
|
26 |
|
27 |
## References
|
28 |
* [ik_llama.cpp](https://github.com/ikawrakow/ik_llama.cpp/)
|
29 |
+
* [ik_llama.cpp Getting Started Guide](https://github.com/ikawrakow/ik_llama.cpp/discussions/258)
|