Update README.md
Browse files
README.md
CHANGED
@@ -18,6 +18,19 @@ base_model: DavidAU/Qwen3-30B-A1.5B-High-Speed
|
|
18 |
This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
19 |
Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
|
20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
## Use with llama.cpp
|
22 |
Install llama.cpp through brew (works on Mac and Linux)
|
23 |
|
|
|
18 |
This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
19 |
Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
|
20 |
|
21 |
+
---
|
22 |
+
This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model,
|
23 |
+
setting the experts in use from 8 to 4 (out of 128 experts).
|
24 |
+
|
25 |
+
|
26 |
+
This method close to doubles the speed of the model and uses 1.5B (of
|
27 |
+
30B) parameters instead of 3B (of 30B) parameters. Depending on the
|
28 |
+
application you may want to
|
29 |
+
use the regular model ("30B-A3B"), and use this model for simpler use
|
30 |
+
case(s) although I did not notice any loss of function during
|
31 |
+
routine (but not extensive) testing.
|
32 |
+
|
33 |
+
---
|
34 |
## Use with llama.cpp
|
35 |
Install llama.cpp through brew (works on Mac and Linux)
|
36 |
|