DavidAU
/

L3-Dark-Planet-8B-GGUF

Model card Files Files and versions

DavidAU commited on Dec 21, 2024

Commit

d6192ce

·

verified ·

1 Parent(s): 7ea7df2

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -36,9 +36,8 @@ pipeline_tag: text-generation
 <B>L3-Dark-Planet-8B-GGUF - Updates Dec 21 2024: (uploading quants ... refreshed, and new quants):</B>
 - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
-- All quants have also been upgraded with "more bits" for output tensor and embed for better performance (this is in addition to the "refresh")
-- All quants (including new "ARM" quants) the output tensor is set at Q8_0. Embed has also been upgraded.
-- New "ARM" quants have been added for machines than can run them. (format: ".../Q4_0_4_4.gguf")
 - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
 - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
 - "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.

 <B>L3-Dark-Planet-8B-GGUF - Updates Dec 21 2024: (uploading quants ... refreshed, and new quants):</B>
 - All quants have been "refreshed", quanted with the lastest LLAMACPP improvements : Better instruction following, output generation across all quants.
+- All quants have also been upgraded with "more bits" for output tensor (all set at Q8_0) and embed for better performance (this is in addition to the "refresh")
+- New "ARM" quants have been added for machines than can run them and output tensor set at Q8_0. (format: ".../Q4_0_4_4.gguf")
 - New specialized quants (in addition to the new refresh/upgrades): "max, max-cpu" (will include this in the file name) for quants "Q2K" (max cpu only), "IQ4_XS", "Q6_K" and "Q8_0"
 - "MAX": output tensor / embed at float 16. (better instruction following/output generation than standard quants)
 - "MAX-CPU": output tensor / embed at bfloat 16, which forces these on to the CPU (Nvidia cards / other will vary), this frees up vram at cost of token/second and you get better instruction following/output generation too.