steampunque
/

Qwen3-30B-A3B-Hybrid-GGUF

4-bit precision

Model card Files Files and versions Community

steampunque commited on Jun 3

Commit

d7a0f37

·

verified ·

1 Parent(s): 4be0dfb

Update README.md

Files changed (1) hide show

README.md +12 -6

README.md CHANGED Viewed

@@ -18,12 +18,7 @@ The hybrid quant employs different quantization levels on a per layer basis to i
 flexibility of trading off performance vs file size.  Less parameter bits are used at deep layers
 and more bits at cortex layers to simulultaneously optimize quantized size and model performance.
 This quant was designed to match IQ4_XS size and perform better than IQ4_XS while using all K-quants for faster CPU
-processing. Partial evals for the model are given here:  https://huggingface.co/spaces/steampunque/benchlm.
-This moe model can be efficiently run by offloading expert tensors to CPU via -ot exps=CPU
-to open up very large context space.  The smaller size of the optimally quantized parameters will give
-an effective boost in CPU processing speed due to reducing the memory BW needed to repeatedly copy them
-from main memory to SIMD regs.  It can also run fully offloaded on GPU via RPC or high VRAM GPU.  For
-this file the layer quants are as follows:
 ```
 LAYER_TYPES='[
    [0 ,"Q3_K_M"],[1 ,"Q3_K_M"],[2 ,"Q3_K_M"],[3 ,"Q3_K_M"],[4 ,"Q3_K_M"],[5 ,"Q3_K_M"],[6 ,"Q3_K_M"],[7 ,"Q3_K_M"],
@@ -44,6 +39,17 @@ Quant |  size  |  PPL |   Comment
 IQ4_XS   | 16.6e9 | 9.15  | default embed and output
 Q4_K_H   | 16.6e9 | 9.10  | Q4_K embed Q6_K output
 ## Download the file from below:
 | Link | Type | Size/e9 B | Notes |
 |------|------|-----------|-------|

 flexibility of trading off performance vs file size.  Less parameter bits are used at deep layers
 and more bits at cortex layers to simulultaneously optimize quantized size and model performance.
 This quant was designed to match IQ4_XS size and perform better than IQ4_XS while using all K-quants for faster CPU
+processing.   For this file the layer quants are as follows:
 ```
 LAYER_TYPES='[
    [0 ,"Q3_K_M"],[1 ,"Q3_K_M"],[2 ,"Q3_K_M"],[3 ,"Q3_K_M"],[4 ,"Q3_K_M"],[5 ,"Q3_K_M"],[6 ,"Q3_K_M"],[7 ,"Q3_K_M"],
 IQ4_XS   | 16.6e9 | 9.15  | default embed and output
 Q4_K_H   | 16.6e9 | 9.10  | Q4_K embed Q6_K output
+Usage:
+This moe model can be efficiently run by offloading expert tensors to CPU via -ot exps=CPU
+to open up very large context space.  The smaller size of the optimally quantized parameters will give
+an effective boost in CPU processing speed due to reducing the memory BW needed to repeatedly copy them
+from main memory to SIMD regs.  It can also run fully offloaded on GPU via RPC or high VRAM GPU.
+Benchmarks:
+Partial evals for the model are given here:  https://huggingface.co/spaces/steampunque/benchlm.
 ## Download the file from below:
 | Link | Type | Size/e9 B | Notes |
 |------|------|-----------|-------|