imi2
/

llama-2-7b-chat-pure-Q4_0-gguf

Model card Files Files and versions Community

imi2 commited on Apr 12

Commit

95260dd

·

verified ·

1 Parent(s): 90fbe4d

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -62,3 +62,5 @@ TODO:
 **INT_N isn't the equivalent or a match for fair comparison. It is 16.3% faster and 13% smaller in this scenario.**
 - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)

 **INT_N isn't the equivalent or a match for fair comparison. It is 16.3% faster and 13% smaller in this scenario.**
 - [Issue Link](https://github.com/microsoft/T-MAC/issues/79)
+AutoGPTQ is used, by default it uses groupsize of 128: making it less bpw and smaller than llama.cpp. https://qwen.readthedocs.io/en/latest/quantization/gptq.html