Triangle104
/

Qwen3-30B-A1.5B-High-Speed-Q3_K_L-GGUF

Text Generation

4 experts activated

Model card Files Files and versions Community

Triangle104 commited on May 24

Commit

05f02ef

·

verified ·

1 Parent(s): 0f49afa

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -18,6 +18,19 @@ base_model: DavidAU/Qwen3-30B-A1.5B-High-Speed
 This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)

 This model was converted to GGUF format from [`DavidAU/Qwen3-30B-A1.5B-High-Speed`](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
 Refer to the [original model card](https://huggingface.co/DavidAU/Qwen3-30B-A1.5B-High-Speed) for more details on the model.
+---
+This is a simple "finetune" of the Qwen's "Qwen 30B-A3B" (MOE) model,
+ setting the experts in use from 8 to 4 (out of 128 experts).
+This method close to doubles the speed of the model and uses 1.5B (of
+ 30B) parameters instead of 3B (of 30B) parameters. Depending on the
+application you may want to
+use the regular model ("30B-A3B"), and use this model for simpler use
+case(s) although I did not notice any loss of function during
+routine (but not extensive) testing.
+---
 ## Use with llama.cpp
 Install llama.cpp through brew (works on Mac and Linux)