VLLM部署报错
命令:VLLM_WORKER_MULTIPROC_METHOD="spawn" CUDA_VISIBLE_DEVICES=4,5,6,7 python3 -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-VL-72B-Instruct-AWQ --host 0.0.0.0 --port 8000 --pipeline-parallel-size 4
error:
ERROR 02-26 14:32:26 registry.py:306] Error in inspecting model architecture 'Qwen2_5_VLForConditionalGeneration'
ERROR 02-26 14:32:26 registry.py:306] Traceback (most recent call last):
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 507, in _run_in_subprocess
ERROR 02-26 14:32:26 registry.py:306] returned.check_returncode()
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/subprocess.py", line 502, in check_returncode
ERROR 02-26 14:32:26 registry.py:306] raise CalledProcessError(self.returncode, self.args, self.stdout,
ERROR 02-26 14:32:26 registry.py:306] subprocess.CalledProcessError: Command '['/home/anaconda3/envs/xinference/bin/python3', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
ERROR 02-26 14:32:26 registry.py:306]
ERROR 02-26 14:32:26 registry.py:306] The above exception was the direct cause of the following exception:
ERROR 02-26 14:32:26 registry.py:306]
ERROR 02-26 14:32:26 registry.py:306] Traceback (most recent call last):
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 304, in _try_inspect_model_cls
ERROR 02-26 14:32:26 registry.py:306] return model.inspect_model_cls()
ERROR 02-26 14:32:26 registry.py:306] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 275, in inspect_model_cls
ERROR 02-26 14:32:26 registry.py:306] return _run_in_subprocess(
ERROR 02-26 14:32:26 registry.py:306] ^^^^^^^^^^^^^^^^^^^
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 510, in _run_in_subprocess
ERROR 02-26 14:32:26 registry.py:306] raise RuntimeError(f"Error raised in subprocess:\n"
ERROR 02-26 14:32:26 registry.py:306] RuntimeError: Error raised in subprocess:
ERROR 02-26 14:32:26 registry.py:306] /home/anaconda3/envs/xinference/lib/python3.11/site-packages/transformers/utils/hub.py:106: FutureWarning: Using TRANSFORMERS_CACHE
is deprecated and will be removed in v5 of Transformers. Use HF_HOME
instead.
ERROR 02-26 14:32:26 registry.py:306] warnings.warn(
ERROR 02-26 14:32:26 registry.py:306] :128: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
ERROR 02-26 14:32:26 registry.py:306] Traceback (most recent call last):
ERROR 02-26 14:32:26 registry.py:306] File "", line 198, in _run_module_as_main
ERROR 02-26 14:32:26 registry.py:306] File "", line 88, in _run_code
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 531, in
ERROR 02-26 14:32:26 registry.py:306] _run()
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 524, in _run
ERROR 02-26 14:32:26 registry.py:306] result = fn()
ERROR 02-26 14:32:26 registry.py:306] ^^^^
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 276, in
ERROR 02-26 14:32:26 registry.py:306] lambda: _ModelInfo.from_model_cls(self.load_model_cls()))
ERROR 02-26 14:32:26 registry.py:306] ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 279, in load_model_cls
ERROR 02-26 14:32:26 registry.py:306] mod = importlib.import_module(self.module_name)
ERROR 02-26 14:32:26 registry.py:306] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/importlib/init.py", line 126, in import_module
ERROR 02-26 14:32:26 registry.py:306] return _bootstrap._gcd_import(name[level:], package, level)
ERROR 02-26 14:32:26 registry.py:306] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-26 14:32:26 registry.py:306] File "", line 1204, in _gcd_import
ERROR 02-26 14:32:26 registry.py:306] File "", line 1176, in _find_and_load
ERROR 02-26 14:32:26 registry.py:306] File "", line 1147, in _find_and_load_unlocked
ERROR 02-26 14:32:26 registry.py:306] File "", line 690, in _load_unlocked
ERROR 02-26 14:32:26 registry.py:306] File "", line 940, in exec_module
ERROR 02-26 14:32:26 registry.py:306] File "", line 241, in _call_with_frames_removed
ERROR 02-26 14:32:26 registry.py:306] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 36, in
ERROR 02-26 14:32:26 registry.py:306] from transformers.models.qwen2_5_vl import (Qwen2_5_VLImageProcessor,
ERROR 02-26 14:32:26 registry.py:306] ImportError: cannot import name 'Qwen2_5_VLImageProcessor' from 'transformers.models.qwen2_5_vl' (/home/anaconda3/envs/xinference/lib/python3.11/site-packages/transformers/models/qwen2_5_vl/init.py)
ERROR 02-26 14:32:26 registry.py:306]
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 911, in
uvloop.run(run_server(args))
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/uvloop/init.py", line 105, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server
async with build_async_engine_client(args) as engine_client:
File "/home/anaconda3/envs/xinference/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/home/anaconda3/envs/xinference/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 160, in build_async_engine_client_from_engine_args
engine_client = AsyncLLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/async_llm_engine.py", line 639, in from_engine_args
engine_config = engine_args.create_engine_config(usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 1075, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/arg_utils.py", line 998, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/config.py", line 364, in init
self.multimodal_config = self._init_multimodal_config(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/config.py", line 424, in _init_multimodal_config
if ModelRegistry.is_multimodal_model(architectures):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 445, in is_multimodal_model
model_cls, _ = self.inspect_model_cls(architectures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 405, in inspect_model_cls
return self._raise_for_unsupported(architectures)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/registry.py", line 357, in _raise_for_unsupported
raise ValueError(
ValueError: Model architectures ['Qwen2_5_VLForConditionalGeneration'] failed to be inspected. Please check the logs for more details.
@classdemo
To resolve this, do this
https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/discussions/7
And this
pip install --force-reinstall git+https://github.com/huggingface/transformers.git@9985d06add07a4cc691dc54a7e34f54205c04d40
@classdemo
To resolve this, do this
https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/discussions/7
And this
pip install --force-reinstall git+https://github.com/huggingface/transformers.git@9985d06add07a4cc691dc54a7e34f54205c04d40
下面的命令可以部署成功(python -m vllm.entrypoints.openai.api_server):
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=3,4,5,6 python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-VL-72B-Instruct-AWQ --quantization awq_marlin --tensor-parallel-size 4 --max-model-len 28672 --gpu-memory-utilization 0.99 --max-num-batched-tokens 28672 --max-num-seqs 64 --host 0.0.0.0 --port 8000 --disable-custom-all-reduce --block-size 16
下面的命令报错(vllm serve ):
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True CUDA_VISIBLE_DEVICES=3,4,5,6 vllm serve Qwen/Qwen2.5-VL-72B-Instruct-AWQ --quantization awq_marlin --tensor-parallel-size 4 --max-model-len 28672 --gpu-memory-utilization 0.99 --max-num-batched-tokens 28672 --max-num-seqs 64 --host 0.0.0.0 --port 8000 --block-size 16 --disable-custom-all-reduce
error:
`(VllmWorkerProcess pid=1808916) INFO 02-28 01:21:50 config.py:3054] cudagraph sizes specified by model runner [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64] is overridden by config [64, 32, 2, 1, 4, 40, 8, 48, 16, 56, 24]
WARNING 02-28 01:21:50 awq_marlin.py:132] Layer 'language_model.model.layers.0.mlp.down_proj' is not supported by AWQMarlin. Falling back to unoptimized AWQ kernels.
(VllmWorkerProcess pid=1808917) WARNING 02-28 01:21:50 awq_marlin.py:132] Layer 'language_model.model.layers.0.mlp.down_proj' is not supported by AWQMarlin. Falling back to unoptimized AWQ kernels.
(VllmWorkerProcess pid=1808918) WARNING 02-28 01:21:50 awq_marlin.py:132] Layer 'language_model.model.layers.0.mlp.down_proj' is not supported by AWQMarlin. Falling back to unoptimized AWQ kernels.
(VllmWorkerProcess pid=1808916) WARNING 02-28 01:21:50 awq_marlin.py:132] Layer 'language_model.model.layers.0.mlp.down_proj' is not supported by AWQMarlin. Falling back to unoptimized AWQ kernels.
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] Traceback (most recent call last):
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 236, in _run_worker_process
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return func(*args, **kwargs)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model_runner.load_model()
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] model = _initialize_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 781, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.language_model = init_vllm_registered_model(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 260, in init_vllm_registered_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return _initialize_model(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model = Qwen2Model(vllm_config=vllm_config,
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.start_layer, self.end_layer, self.layers = make_layers(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] [PPMissingLayer() for _ in range(start_layer)] + [
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] lambda prefix: Qwen2DecoderLayer(config=config,
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.mlp = Qwen2MLP(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.down_proj = RowParallelLinear(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in init
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.quant_method.create_weights(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/awq.py", line 104, in create_weights
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] raise ValueError(
(VllmWorkerProcess pid=1808917) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] Traceback (most recent call last):
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 236, in _run_worker_process
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return func(*args, **kwargs)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model_runner.load_model()
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] model = _initialize_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 781, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.language_model = init_vllm_registered_model(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 260, in init_vllm_registered_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return _initialize_model(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.model = Qwen2Model(vllm_config=vllm_config,
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.start_layer, self.end_layer, self.layers = make_layers(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] [PPMissingLayer() for _ in range(start_layer)] + [
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] lambda prefix: Qwen2DecoderLayer(config=config,
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.mlp = Qwen2MLP(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.down_proj = RowParallelLinear(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in init
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] self.quant_method.create_weights(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/awq.py", line 104, in create_weights
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] raise ValueError(
(VllmWorkerProcess pid=1808918) ERROR 02-28 01:21:50 multiproc_worker_utils.py:242] ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
ERROR 02-28 01:21:51 engine.py:400] The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
ERROR 02-28 01:21:51 engine.py:400] Traceback (most recent call last):
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
ERROR 02-28 01:21:51 engine.py:400] engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args
ERROR 02-28 01:21:51 engine.py:400] return cls(ipc_path=ipc_path,
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in init
ERROR 02-28 01:21:51 engine.py:400] self.engine = LLMEngine(*args, **kwargs)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in init
ERROR 02-28 01:21:51 engine.py:400] self.model_executor = executor_class(vllm_config=vllm_config, )
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in init
ERROR 02-28 01:21:51 engine.py:400] super().init(*args, **kwargs)
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in init
ERROR 02-28 01:21:51 engine.py:400] self._init_executor()
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
ERROR 02-28 01:21:51 engine.py:400] self._run_workers("load_model",
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
ERROR 02-28 01:21:51 engine.py:400] driver_worker_output = run_method(self.driver_worker, sent_method,
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method
ERROR 02-28 01:21:51 engine.py:400] return func(*args, **kwargs)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
ERROR 02-28 01:21:51 engine.py:400] self.model_runner.load_model()
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
ERROR 02-28 01:21:51 engine.py:400] self.model = get_model(vllm_config=self.vllm_config)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
ERROR 02-28 01:21:51 engine.py:400] return loader.load_model(vllm_config=vllm_config)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
ERROR 02-28 01:21:51 engine.py:400] model = _initialize_model(vllm_config=vllm_config)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
ERROR 02-28 01:21:51 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 781, in init
ERROR 02-28 01:21:51 engine.py:400] self.language_model = init_vllm_registered_model(
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 260, in init_vllm_registered_model
ERROR 02-28 01:21:51 engine.py:400] return _initialize_model(vllm_config=vllm_config, prefix=prefix)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
ERROR 02-28 01:21:51 engine.py:400] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in init
ERROR 02-28 01:21:51 engine.py:400] self.model = Qwen2Model(vllm_config=vllm_config,
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in init
ERROR 02-28 01:21:51 engine.py:400] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in init
ERROR 02-28 01:21:51 engine.py:400] self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers
ERROR 02-28 01:21:51 engine.py:400] [PPMissingLayer() for _ in range(start_layer)] + [
ERROR 02-28 01:21:51 engine.py:400] ^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in
ERROR 02-28 01:21:51 engine.py:400] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in
ERROR 02-28 01:21:51 engine.py:400] lambda prefix: Qwen2DecoderLayer(config=config,
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in init
ERROR 02-28 01:21:51 engine.py:400] self.mlp = Qwen2MLP(
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in init
ERROR 02-28 01:21:51 engine.py:400] self.down_proj = RowParallelLinear(
ERROR 02-28 01:21:51 engine.py:400] ^^^^^^^^^^^^^^^^^^
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in init
ERROR 02-28 01:21:51 engine.py:400] self.quant_method.create_weights(
ERROR 02-28 01:21:51 engine.py:400] File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/awq.py", line 104, in create_weights
ERROR 02-28 01:21:51 engine.py:400] raise ValueError(
ERROR 02-28 01:21:51 engine.py:400] ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
ERROR 02-28 01:21:51 multiproc_worker_utils.py:124] Worker VllmWorkerProcess pid 1808918 died, exit code: -15
INFO 02-28 01:21:51 multiproc_worker_utils.py:128] Killing local vLLM worker processes
Process SpawnProcess-1:
Traceback (most recent call last):
File "/home/anaconda3/envs/xinference/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/home/anaconda3/envs/xinference/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 402, in run_mp_engine
raise e
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine
engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 124, in from_engine_args
return cls(ipc_path=ipc_path,
^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 76, in init
self.engine = LLMEngine(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/engine/llm_engine.py", line 273, in init
self.model_executor = executor_class(vllm_config=vllm_config, )
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 271, in init
super().init(*args, **kwargs)
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/executor_base.py", line 52, in init
self._init_executor()
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 125, in _init_executor
self._run_workers("load_model",
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/executor/mp_distributed_executor.py", line 185, in _run_workers
driver_worker_output = run_method(self.driver_worker, sent_method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/utils.py", line 2196, in run_method
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
self.model_runner.load_model()
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
self.model = get_model(vllm_config=self.vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model
return loader.load_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
model = _initialize_model(vllm_config=vllm_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 781, in init
self.language_model = init_vllm_registered_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 260, in init_vllm_registered_model
return _initialize_model(vllm_config=vllm_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 125, in _initialize_model
return model_class(vllm_config=vllm_config, prefix=prefix)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 453, in init
self.model = Qwen2Model(vllm_config=vllm_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in init
old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 307, in init
self.start_layer, self.end_layer, self.layers = make_layers(
^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 557, in make_layers
[PPMissingLayer() for _ in range(start_layer)] + [
^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/utils.py", line 558, in
maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 309, in
lambda prefix: Qwen2DecoderLayer(config=config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 220, in init
self.mlp = Qwen2MLP(
^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/models/qwen2.py", line 82, in init
self.down_proj = RowParallelLinear(
^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/linear.py", line 1062, in init
self.quant_method.create_weights(
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/model_executor/layers/quantization/awq.py", line 104, in create_weights
raise ValueError(
ValueError: The input size is not aligned with the quantized weight shape. This can be caused by too large tensor parallel size.
[rank0]:[W228 01:21:51.045376236 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Traceback (most recent call last):
File "/home/anaconda3/envs/xinference/bin/vllm", line 8, in
sys.exit(main())
^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
args.dispatch_function(args)
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/cli/serve.py", line 34, in cmd
uvloop.run(run_server(args))
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/uvloop/init.py", line 105, in run
return runner.run(wrapper())
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 947, in run_server
async with build_async_engine_client(args) as engine_client:
File "/home/anaconda3/envs/xinference/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 139, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/home/anaconda3/envs/xinference/lib/python3.11/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/home/anaconda3/envs/xinference/lib/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 233, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
(xinference) [root@llm16 models]# /home/anaconda3/envs/xinference/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '`