QuantTrio
/

GLM-4.6-AWQ

Text Generation

4-bit precision

Model card Files Files and versions

bullerwins commited on 9 days ago

Commit

4306f75

·

verified ·

1 Parent(s): 10ec42d

Update README.md

small repo update in the vllm command

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ otherwise the expert tensors couldn’t be evenly sharded across GPU devices.</i
 ```
 CONTEXT_LENGTH=32768
 vllm serve \
-    tclf90/GLM-4.6-AWQ \
     --served-model-name My_Model \
     --enable-auto-tool-choice \
     --tool-call-parser glm45 \

 ```
 CONTEXT_LENGTH=32768
 vllm serve \
+    QuantTrio/GLM-4.6-AWQ \
     --served-model-name My_Model \
     --enable-auto-tool-choice \
     --tool-call-parser glm45 \