Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp

by vmajor - opened 3 days ago

Discussion

vmajor

3 days ago

•

edited 3 days ago

EDIT: I am trying to run Qwen3-235B-A22B-128K-Q8_0

vllm says this:
raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
ValueError: GGUF model with architecture qwen3moe is not supported yet.

and llama.cpp refuses to load the model with a generic error

both vllm and llama.cpp were built from latest git pull yesterday.

How do you run this model?

vmajor changed discussion title from Does not run with vllm, or llama.cpp to Qwen3-235B-A22B-128K-Q8_0 does not run with vllm, or llama.cpp 3 days ago

shimmyshimmer

Unsloth AI org 3 days ago

Could it be possible that it's too big? VLLM doesnt support it yes, but llama.cpp should definitely work

vmajor

3 days ago

•

edited 3 days ago

...no. I can load R1 Zero q5 quant comfortably and that is far larger than the Qwen3-235B-A22B-128K-Q8_0.

Interestingly LMStudio which uses llama.cpp at least shows me the model info, llama-server and llama-cli flat out refuse to load it.


common_init_from_params: failed to load model '/media/user/AI2/models/Qwen3-235B-A22B-128K-Q8_0.gguf'

main: error: unable to load model```

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment