W8A8-fp8 version
#12
by
OveJie
- opened
I tried to process w8a8-fp8 using the llmcompressor execution method of this model RedHatAI/gemma-3-27b-it-FP8-dynamic , but I don't know why the resulting weight file is problematic and cannot be used in vllm v0.9.2.
In vllm v0.9.2 already fix this bug
INFO 07-17 05:42:13 [gpu_model_runner.py:1770] Starting to load model /models/gemma-3-27b-it-abliterated-W8A8-fp8...
INFO 07-17 05:42:14 [gpu_model_runner.py:1775] Loading model from scratch...
INFO 07-17 05:42:14 [cuda.py:287] Using FlexAttention backend on V1 engine.
INFO 07-17 05:42:14 [cuda.py:284] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/6 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 17% Completed | 1/6 [00:00<00:03, 1.48it/s]
Loading safetensors checkpoint shards: 33% Completed | 2/6 [00:01<00:02, 1.39it/s]
ERROR 07-17 05:42:17 [core.py:586] EngineCore failed to start.
ERROR 07-17 05:42:17 [core.py:586] Traceback (most recent call last):
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 577, in run_engine_core
ERROR 07-17 05:42:17 [core.py:586] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 404, in __init__
ERROR 07-17 05:42:17 [core.py:586] super().__init__(vllm_config, executor_class, log_stats,
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 75, in __init__
ERROR 07-17 05:42:17 [core.py:586] self.model_executor = executor_class(vllm_config)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Process EngineCore_0:
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 53, in __init__
ERROR 07-17 05:42:17 [core.py:586] self._init_executor()
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 48, in _init_executor
ERROR 07-17 05:42:17 [core.py:586] self.collective_rpc("load_model")
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 57, in collective_rpc
ERROR 07-17 05:42:17 [core.py:586] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 2736, in run_method
ERROR 07-17 05:42:17 [core.py:586] return func(*args, **kwargs)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 185, in load_model
ERROR 07-17 05:42:17 [core.py:586] self.model_runner.load_model()
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 1776, in load_model
ERROR 07-17 05:42:17 [core.py:586] self.model = model_loader.load_model(
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 41, in load_model
ERROR 07-17 05:42:17 [core.py:586] self.load_weights(model, model_config)
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 269, in load_weights
ERROR 07-17 05:42:17 [core.py:586] loaded_weights = model.load_weights(
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/gemma3_mm.py", line 720, in load_weights
ERROR 07-17 05:42:17 [core.py:586] return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 291, in load_weights
ERROR 07-17 05:42:17 [core.py:586] autoloaded_weights = set(self._load_module("", self.module, weights))
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 249, in _load_module
ERROR 07-17 05:42:17 [core.py:586] yield from self._load_module(prefix,
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 222, in _load_module
ERROR 07-17 05:42:17 [core.py:586] loaded_params = module_load_weights(weights)
ERROR 07-17 05:42:17 [core.py:586] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 07-17 05:42:17 [core.py:586] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/siglip.py", line 519, in load_weights
ERROR 07-17 05:42:17 [core.py:586] param = params_dict[name]
ERROR 07-17 05:42:17 [core.py:586] ~~~~~~~~~~~^^^^^^
ERROR 07-17 05:42:17 [core.py:586] KeyError: 'vision_model.encoder.layers.0.mlp.fc1.weight_scale'
https://github.com/vllm-project/llm-compressor/issues/1306#issuecomment-2942874690
use it can run W8A8-fp8 ,But the entire model cannot be used.
OveJie
changed discussion status to
closed
OveJie
changed discussion status to
open