DavidAU
/

Qwen3-30B-A6B-16-Extreme-128k-context

Text Generation

Model card Files Files and versions Community

DavidAU commited on May 4

Commit

420e630

·

verified ·

1 Parent(s): 0e75f4b

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -35,9 +35,9 @@ Use Jinja Template or CHATML template.
 IMPORTANT NOTES:
-- Due to the unique nature (MOE, Size, Activated experts) of this model GGUF quants can be run on the CPU, GPU or with GPU part "off-load", right up to full precision.
-- This model is difficult to Imatrix : You need a much larger imatrix file / multi-language / multi-content to imatrix it.
-- GPU speeds will be BLISTERING 4x-8x or higher than CPU only AND relative to other "30B" models (equal roughly to 6B "normal" model speeds).
 Please refer the org model card for details, benchmarks, how to use, settings, system roles etc etc :

 IMPORTANT NOTES:
+- Due to the unique nature (MOE, Size, Activated experts, size of experts) of this model GGUF quants can be run on the CPU, GPU or with GPU part "off-load", right up to full precision.
+- This model is difficult to Imatrix : You need a much larger imatrix file / multi-language / multi-content (ie code/text) to imatrix it.
+- GPU speeds will be BLISTERING 4x-8x or higher than CPU only speeds AND this model will be BLISTERING too, relative to other "30B" models (Token per second speed equal roughly to 6B "normal" model speeds).
 Please refer the org model card for details, benchmarks, how to use, settings, system roles etc etc :