Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp

#9
by vmajor - opened

EDIT: I am trying to run Qwen3-235B-A22B-128K-Q8_0

vllm says this:
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture qwen3moe is not supported yet.

and llama.cpp refuses to load the model with a generic error

both vllm and llama.cpp were built from latest git pull yesterday.

How do you run this model?

vmajor changed discussion title from Does not run with vllm, or llama.cpp to Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp
Unsloth AI org

Could it be possible that it's too big? VLLM doesnt support it yes, but llama.cpp should definitely work

...no. I can load R1 Zero q5 quant comfortably and that is far larger than the Qwen3-235B-A22B-128K-Q8_0.

Interestingly LMStudio which uses llama.cpp at least shows me the model info, llama-server and llama-cli flat out refuse to load it.


common_init_from_params: failed to load model '/media/user/AI2/models/Qwen3-235B-A22B-128K-Q8_0.gguf'

main: error: unable to load model```

Sign up or log in to comment