Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp
#9
by
vmajor
- opened
EDIT: I am trying to run Qwen3-235B-A22B-128K-Q8_0
vllm says this:
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture qwen3moe is not supported yet.
and llama.cpp refuses to load the model with a generic error
both vllm and llama.cpp were built from latest git pull yesterday.
How do you run this model?
vmajor
changed discussion title from
Does not run with vllm, or llama.cpp
to Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp
Could it be possible that it's too big? VLLM doesnt support it yes, but llama.cpp should definitely work
...no. I can load R1 Zero q5 quant comfortably and that is far larger than the Qwen3-235B-A22B-128K-Q8_0.
Interestingly LMStudio which uses llama.cpp at least shows me the model info, llama-server and llama-cli flat out refuse to load it.
common_init_from_params: failed to load model '/media/user/AI2/models/Qwen3-235B-A22B-128K-Q8_0.gguf'
main: error: unable to load model```