gaunernst/gemma-3-12b-it-qat-compressed-tensors

Thank you for the compressed tensors.

I am newbie using vllm, but why I am getting a gptq_marlin error while trying to run it?

11 docker .... vllm-rocm vllm serve /app/model --served-model-name gaunernst/gemma-3-12b-it-qat-compressed-tensors
12 --tensor-parallel-size 1
13 --gpu-memory-utilization 0.90
14 --dtype bfloat16
15 --max-model-len 1024
16 --max-num-seqs=4
17 --trust-remote-code
18 --enforce-eager
19 --quantization compressed-tensors
.......

Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:02<00:02, 2.53s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:05<00:00, 3.02s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:05<00:00, 2.95s/it]

INFO 04-25 14:45:09 [loader.py:458] Loading weights took 6.37 seconds
ERROR 04-25 14:45:09 [engine.py:448] '_OpNamespace' '_C' object has no attribute 'gptq_marlin_repack'
ERROR 04-25 14:45:09 [engine.py:448] Traceback (most recent call last):
ERROR 04-25 14:45:09 [engine.py:448] File "/usr/local/lib/python3.12/dist-packages/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 04-25 14:45:09 [engine.py:448] engine = MQLLMEngine.from_vllm_config(
ERROR 04-25 14:45:09 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[...]
ERROR 04-25 14:54:37 [engine.py:448] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/marlin.py", line 95, in transform_w_q
ERROR 04-25 14:54:37 [engine.py:448] x.data = ops.gptq_marlin_repack(x.data.contiguous(),
ERROR 04-25 14:54:37 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 14:54:37 [engine.py:448] File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 776, in gptq_marlin_repack
ERROR 04-25 14:54:37 [engine.py:448] return torch.ops._C.gptq_marlin_repack(b_q_weight, perm, size_k, size_n,
ERROR 04-25 14:54:37 [engine.py:448] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 14:54:37 [engine.py:448] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1231, in getattr
ERROR 04-25 14:54:37 [engine.py:448] raise AttributeError(
ERROR 04-25 14:54:37 [engine.py:448] AttributeError: '_OpNamespace' '_C' object has no attribute 'gptq_marlin_repack'

gaunernst
/

gemma-3-12b-it-qat-compressed-tensors

gptq_marlin_repack error