fixup readme
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ Special mix `IQ3_K_R4`/`IQ2_K_R4` routed experts with all other layers full `q8_
|
|
40 |
Great for CPU+GPU "troll rig" high end gamer systems e.g. 9950X 96 GB RAM + 3090TI 24 GB VRAM + Gen 5 NVMe SSD.
|
41 |
|
42 |
#### Custom Mixes
|
43 |
-
If you have more than 48GB VRAM across multiple GPUs, consider rolling your
|
44 |
|
45 |
## Quick Start
|
46 |
#### `ik_llama.cpp` API server for GPU+CPU
|
@@ -95,7 +95,7 @@ numactl -N 0 -m 0 \
|
|
95 |
|
96 |
## Quant Comparisons
|
97 |
|
98 |
-
These are probably the **best quants available in this size class** for `V3-0324`!
|
99 |
|
100 |

|
101 |
|
|
|
40 |
Great for CPU+GPU "troll rig" high end gamer systems e.g. 9950X 96 GB RAM + 3090TI 24 GB VRAM + Gen 5 NVMe SSD.
|
41 |
|
42 |
#### Custom Mixes
|
43 |
+
If you have more than 48GB VRAM across multiple GPUs, consider rolling your own custom quants to optimize size and performance with whatever hardware you have using custom `-ot` expression. If you have less VRAM, you could make a custom quant leaner in the non routed expert layers or get 64k+ context in 24GB VRAM. Also you can use the offline repack tool if you want to do CPU only with `mmap()` still enabled.
|
44 |
|
45 |
## Quick Start
|
46 |
#### `ik_llama.cpp` API server for GPU+CPU
|
|
|
95 |
|
96 |
## Quant Comparisons
|
97 |
|
98 |
+
These are probably among the **best quants available in this size class** for `V3-0324`!
|
99 |
|
100 |

|
101 |
|