[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

njuptxhy · 2025-05-19T09:23:10Z

Your current environment

The output of `python collect_env.py`

docker run -it \ --name vllm-ascend-xhy-pd-8 \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -v /mnt/nvme0/xhy/workspace:/workspace-xhy \ --privileged=true --net=host \ -it vllm-ascend-xhy-pd:v1.0 bash

🐛 Describe the bug

WARNING 05-08 09:54:12connector.py:342][rank71:Failedto,receive all:KVs and hidden states,redo model forwarding

The warning indicates that the p node failed to receive the kv cache and hidden states from the d node, resulting in an error code LLM_KV_CACHE_NOT_EXIST. Consequently, the p node had to redo model forwarding. Potential causes include network issues between nodes, cache not properly initialized on the d node, misconfigurations in services or nodes, request sequence problems, or resource limitations leading to request failures.

jianzs · 2025-05-19T16:12:58Z

please provide details about your parallelism settings, such as the tensor/data parallel size for both prefill and decode nodes, the number of npus in use, and etc.

njuptxhy added the bug Something isn't working label May 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

njuptxhy commented May 19, 2025

jianzs commented May 19, 2025

Uh oh!

[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

Comments

njuptxhy commented May 19, 2025

Your current environment

🐛 Describe the bug

jianzs commented May 19, 2025

Uh oh!