Any one can run this model with SGlang framework?

#13
by muziyongshixin - opened

I try to run this model with SGlang, but it is extremely slow. Does anyone have a good setting to run this model with SGlang?

Cognitive Computations org

Try run this with vLLM, it is much faster.

you can try this command python3 -m sglang.launch_server --host 0.0.0.0 --port 30000 --model-path models/DeepSeek-R1-AWQ --tp 8 --enable-p2p-check --trust-remote-code --dtype float16 --mem-fraction-static 0.95 --served-model-name deepseek-r1-awq --disable-cuda-graph with sglang==0.4.2
but the result is not as expected..I get empty content for some queries and the think is not complete

Sign up or log in to comment