[Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' #876

Million-mo · 2025-05-15T12:10:56Z

Your current environment

ascen

PyTorch version: 2.5.1
Is debug build: False

OS: Ubuntu 22.04.5 LTS (aarch64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 4.0.0
Libc version: glibc-2.35

Python version: 3.10.17 (main, Apr 30 2025, 16:00:31) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-5.10.0-60.18.0.50.r865_35.hce2.aarch64-aarch64-with-glibc2.35

CPU:
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          192
On-line CPU(s) list:             0-191
Vendor ID:                       HiSilicon
Model name:                      Kunpeng-920
Model:                           0
Thread(s) per core:              1
Core(s) per cluster:             48
Socket(s):                       -
Cluster(s):                      4
Stepping:                        0x1
BogoMIPS:                        200.00
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma dcpop asimddp asimdfhm ssbs
L1d cache:                       12 MiB (192 instances)
L1i cache:                       12 MiB (192 instances)
L2 cache:                        96 MiB (192 instances)
L3 cache:                        192 MiB (8 instances)
NUMA node(s):                    8
NUMA node0 CPU(s):               0-23
NUMA node1 CPU(s):               24-47
NUMA node2 CPU(s):               48-71
NUMA node3 CPU(s):               72-95
NUMA node4 CPU(s):               96-119
NUMA node5 CPU(s):               120-143
NUMA node6 CPU(s):               144-167
NUMA node7 CPU(s):               168-191
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Retbleed:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] pyzmq==26.4.0
[pip3] torch==2.5.1
[pip3] torch-npu==2.5.1
[pip3] torchvision==0.20.1
[pip3] transformers==4.51.3
[conda] Could not collect
vLLM Version: 0.8.5.post1
vLLM Ascend Version: 0.8.5rc1

ENV Variables:
ATB_OPSRUNNER_KERNEL_CACHE_TILING_SIZE=10240
ATB_OPSRUNNER_KERNEL_CACHE_LOCAL_COUNT=1
ATB_STREAM_SYNC_EVERY_RUNNER_ENABLE=0
ATB_OPSRUNNER_SETUP_CACHE_ENABLE=1
ATB_WORKSPACE_MEM_ALLOC_GLOBAL=0
ATB_DEVICE_TILING_BUFFER_BLOCK_NUM=32
ATB_STREAM_SYNC_EVERY_KERNEL_ENABLE=0
ATB_OPSRUNNER_KERNEL_CACHE_GLOABL_COUNT=5
ATB_HOME_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0
ASCEND_TOOLKIT_HOME=/usr/local/Ascend/ascend-toolkit/latest
ASCEND_DOCKER_RUNTIME=True
ATB_COMPARE_TILING_EVERY_KERNEL=0
ASCEND_OPP_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp
LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_0/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling/lib/linux/aarch64:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/lib:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/examples:/usr/local/Ascend/nnal/atb/latest/atb/cxx_abi_1/tests/atbopstest:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64:/usr/local/Ascend/ascend-toolkit/latest/tools/aml/lib64/plugin:/usr/local/Ascend/ascend-toolkit/latest/lib64:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/opskernel:/usr/local/Ascend/ascend-toolkit/latest/lib64/plugin/nnengine:/usr/local/Ascend/ascend-toolkit/latest/opp/built-in/op_impl/ai_core/tbe/op_tiling:/usr/local/Ascend/driver/lib64/common/:/usr/local/Ascend/driver/lib64/driver/:
ASCEND_AICPU_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_OPSRUNNER_KERNEL_CACHE_TYPE=3
ATB_RUNNER_POOL_SIZE=64
ATB_STREAM_SYNC_EVERY_OPERATION_ENABLE=0
ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/latest
ATB_MATMUL_SHUFFLE_K_ENABLE=1
ATB_LAUNCH_KERNEL_WITH_TILING=1
ATB_WORKSPACE_MEM_ALLOC_ALG_TYPE=1
ATB_HOST_TILING_BUFFER_BLOCK_NUM=128
ATB_SHARE_MEMORY_NAME_SUFFIX=
TORCH_DEVICE_BACKEND_AUTOLOAD=1
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1


NPU:
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.6                   Version: 23.0.6                                               |
+---------------------------+---------------+----------------------------------------------------+
| NPU   Name                | Health        | Power(W)    Temp(C)           Hugepages-Usage(page)|
| Chip                      | Bus-Id        | AICore(%)   Memory-Usage(MB)  HBM-Usage(MB)        |
+===========================+===============+====================================================+
| 1     910B4               | OK            | 85.2        37                0    / 0             |
| 0                         | 0000:01:00.0  | 0           0    / 0          2802 / 32768         |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU     Chip              | Process id    | Process name             | Process memory(MB)      |
+===========================+===============+====================================================+
| No running processes found in NPU 1                                                            |
+===========================+===============+====================================================+

CANN:
package_name=Ascend-cann-toolkit
version=8.1.RC1
innerversion=V100R001C21SPC001B238
compatible_version=[V100R001C15],[V100R001C18],[V100R001C19],[V100R001C20],[V100R001C21]
arch=aarch64
os=linux
path=/usr/local/Ascend/ascend-toolkit/8.1.RC1/aarch64-linux

🐛 Describe the bug

ERROR 05-15 11:58:24 [engine.py:448] 'model.layers.0.self_attn.q_proj.weight'
ERROR 05-15 11:58:24 [engine.py:448] Traceback (most recent call last):
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/engine/multiprocessing/engine.py", line 436, in run_mp_engine
ERROR 05-15 11:58:24 [engine.py:448] engine = MQLLMEngine.from_vllm_config(
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/engine/multiprocessing/engine.py", line 128, in from_vllm_config
ERROR 05-15 11:58:24 [engine.py:448] return cls(
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/engine/multiprocessing/engine.py", line 82, in init
ERROR 05-15 11:58:24 [engine.py:448] self.engine = LLMEngine(*args, **kwargs)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/engine/llm_engine.py", line 275, in init
ERROR 05-15 11:58:24 [engine.py:448] self.model_executor = executor_class(vllm_config=vllm_config)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/executor/executor_base.py", line 52, in init
ERROR 05-15 11:58:24 [engine.py:448] self._init_executor()
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 47, in _init_executor
ERROR 05-15 11:58:24 [engine.py:448] self.collective_rpc("load_model")
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 05-15 11:58:24 [engine.py:448] answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/utils.py", line 2456, in run_method
ERROR 05-15 11:58:24 [engine.py:448] return func(*args, **kwargs)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/worker.py", line 235, in load_model
ERROR 05-15 11:58:24 [engine.py:448] self.model_runner.load_model()
ERROR 05-15 11:58:24 [engine.py:448] File "/usr/local/python3.10.17/lib/python3.10/site-packages/mindie_turbo/adaptor/vllm/weight_utils.py", line 94, in postprocess_loading
ERROR 05-15 11:58:24 [engine.py:448] func(self)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm-ascend/vllm_ascend/worker/model_runner.py", line 945, in load_model
ERROR 05-15 11:58:24 [engine.py:448] self.model = get_model(vllm_config=self.vllm_config)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/init.py", line 14, in get_model
ERROR 05-15 11:58:24 [engine.py:448] return loader.load_model(vllm_config=vllm_config)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/loader.py", line 452, in load_model
ERROR 05-15 11:58:24 [engine.py:448] model = _initialize_model(vllm_config=vllm_config)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
ERROR 05-15 11:58:24 [engine.py:448] return model_class(vllm_config=vllm_config, prefix=prefix)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 438, in init
ERROR 05-15 11:58:24 [engine.py:448] self.model = Qwen2Model(vllm_config=vllm_config,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/compilation/decorators.py", line 151, in init
ERROR 05-15 11:58:24 [engine.py:448] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 305, in init
ERROR 05-15 11:58:24 [engine.py:448] self.start_layer, self.end_layer, self.layers = make_layers(
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 609, in make_layers
ERROR 05-15 11:58:24 [engine.py:448] [PPMissingLayer() for _ in range(start_layer)] + [
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/utils.py", line 610, in
ERROR 05-15 11:58:24 [engine.py:448] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 307, in
ERROR 05-15 11:58:24 [engine.py:448] lambda prefix: decoder_layer_type(config=config,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 205, in init
ERROR 05-15 11:58:24 [engine.py:448] self.self_attn = Qwen2Attention(
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/models/qwen2.py", line 135, in init
ERROR 05-15 11:58:24 [engine.py:448] self.qkv_proj = QKVParallelLinear(
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/layers/linear.py", line 850, in init
ERROR 05-15 11:58:24 [engine.py:448] super().init(input_size=input_size,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/layers/linear.py", line 396, in init
ERROR 05-15 11:58:24 [engine.py:448] super().init(input_size,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm/vllm/model_executor/layers/linear.py", line 243, in init
ERROR 05-15 11:58:24 [engine.py:448] self.quant_method = quant_config.get_quant_method(self,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 87, in get_quant_method
ERROR 05-15 11:58:24 [engine.py:448] if self.is_layer_skipped_ascend(prefix,
ERROR 05-15 11:58:24 [engine.py:448] File "/vllm-workspace/vllm-ascend/vllm_ascend/quantization/quant_config.py", line 123, in is_layer_skipped_ascend
ERROR 05-15 11:58:24 [engine.py:448] is_shard_skipped = self.quant_description[shard_prefix +
ERROR 05-15 11:58:24 [engine.py:448] KeyError: 'model.layers.0.self_attn.q_proj.weight'
Traceback (most recent call last):
File "/usr/local/python3.10.17/bin/vllm", line 8, in
sys.exit(main())
File "/vllm-workspace/vllm/vllm/entrypoints/cli/main.py", line 53, in main
args.dispatch_function(args)
File "/vllm-workspace/vllm/vllm/entrypoints/cli/serve.py", line 27, in cmd
uvloop.run(run_server(args))
File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
return loop.run_until_complete(wrapper())
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/python3.10.17/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
return await main
File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 1078, in run_server
async with build_async_engine_client(args) as engine_client:
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
async with build_async_engine_client_from_engine_args(
File "/usr/local/python3.10.17/lib/python3.10/contextlib.py", line 199, in aenter
return await anext(self.gen)
File "/vllm-workspace/vllm/vllm/entrypoints/openai/api_server.py", line 269, in build_async_engine_client_from_engine_args
raise RuntimeError(
RuntimeError: Engine process failed to start. See stack trace for the root cause.
[ERROR] 2025-05-15-11:58:25 (PID:6688, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
root@devserver-24ed-0:/vllm-workspace/vllm# /usr/local/python3.10.17/lib/python3.10/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

The text was updated successfully, but these errors were encountered:

Yikun · 2025-05-16T01:31:13Z

Thanks for feedback, we are working with ModelSlim to tag a stable version, after ready we will comments here.

@22dimensions are working on this.

Million-mo added the bug Something isn't working label May 15, 2025

Million-mo changed the title ~~[Bug]: KeyError: 'model.layers.0.self_attn.q_proj.weight'~~ [Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' May 15, 2025

Yikun added quantization module:quantization and removed quantization labels May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' #876

[Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' #876

Million-mo commented May 15, 2025

Yikun commented May 16, 2025

Uh oh!

[Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' #876

[Bug]: Qwen2.5 7B W8A8 KeyError: 'model.layers.0.self_attn.q_proj.weight' #876

Comments

Million-mo commented May 15, 2025

Your current environment

🐛 Describe the bug

Yikun commented May 16, 2025

Uh oh!