Triangle104 commited on
Commit
05f02ef
·
verified ·
1 Parent(s): 0f49afa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -18,6 +18,19 @@ base_model: DavidAU/Qwen3-30B-A1.5B-High-Speed
18
  This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
19
  Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ## Use with llama.cpp
22
  Install llama.cpp through brew (works on Mac and Linux)
23
 
 
18
  This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
19
  Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
20
 
21
+ ---
22
+ This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model,
23
+ setting the experts in use from 8 to 4 (out of 128 experts).
24
+
25
+
26
+ This method close to doubles the speed of the model and uses 1.5B (of
27
+ 30B) parameters instead of 3B (of 30B) parameters. Depending on the
28
+ application you may want to
29
+ use the regular model ("30B-A3B"), and use this model for simpler use
30
+ case(s) although I did not notice any loss of function during
31
+ routine (but not extensive) testing.
32
+
33
+ ---
34
  ## Use with llama.cpp
35
  Install llama.cpp through brew (works on Mac and Linux)
36