ubergarm
/

DeepSeek-V3-0324-GGUF

Text Generation

Model card Files Files and versions Community

ubergarm commited on Apr 2

Commit

aa6db93

·

1 Parent(s): 8e1f011

fixup readme

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -40,7 +40,7 @@ Special mix `IQ3_K_R4`/`IQ2_K_R4` routed experts with all other layers full `q8_
 Great for CPU+GPU "troll rig" high end gamer systems e.g. 9950X 96 GB RAM + 3090TI 24 GB VRAM + Gen 5 NVMe SSD.
 #### Custom Mixes
-If you have more than 48GB VRAM across multiple GPUs, consider rolling your can own custom quants to optimize size and performance with whatever hardware you have using custom `-ot` expression. If you have less VRAM, you could make a custom quant leaner in the non routed expert layers or get 64k+ context in 24GB VRAM. Also you can use the offline repack tool if you want to do CPU only with `mmap()` still enabled.
 ## Quick Start
 #### `ik_llama.cpp` API server for GPU+CPU
@@ -95,7 +95,7 @@ numactl -N 0 -m 0 \
 ## Quant Comparisons
-These are probably the **best quants available in this size class** for `V3-0324`!
 ![Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`](images/benchmarks-01.png "Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`")

 Great for CPU+GPU "troll rig" high end gamer systems e.g. 9950X 96 GB RAM + 3090TI 24 GB VRAM + Gen 5 NVMe SSD.
 #### Custom Mixes
+If you have more than 48GB VRAM across multiple GPUs, consider rolling your own custom quants to optimize size and performance with whatever hardware you have using custom `-ot` expression. If you have less VRAM, you could make a custom quant leaner in the non routed expert layers or get 64k+ context in 24GB VRAM. Also you can use the offline repack tool if you want to do CPU only with `mmap()` still enabled.
 ## Quick Start
 #### `ik_llama.cpp` API server for GPU+CPU
 ## Quant Comparisons
+These are probably among the **best quants available in this size class** for `V3-0324`!
 ![Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`](images/benchmarks-01.png "Benchmarks showing these quants are smaller in size yet similar in performance to the `Q8_0`")