Skip to content

[Bug]: error code is. is [pull_cache] failed, error code is LLMstatuscode.LLM KV_CACHE_NOT EXIST, Cachekey #899

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
njuptxhy opened this issue May 19, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@njuptxhy
Copy link

Your current environment

The output of `python collect_env.py` docker run -it \ --name vllm-ascend-xhy-pd-8 \ --device /dev/davinci0 \ --device /dev/davinci1 \ --device /dev/davinci2 \ --device /dev/davinci3 \ --device /dev/davinci4 \ --device /dev/davinci5 \ --device /dev/davinci6 \ --device /dev/davinci7 \ --device /dev/davinci_manager \ --device /dev/devmm_svm \ --device /dev/hisi_hdc \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/Ascend/driver/version.info:/usr/local/Ascend/driver/version.info \ -v /etc/ascend_install.info:/etc/ascend_install.info \ -v /root/.cache:/root/.cache \ -v /mnt/nvme0/xhy/workspace:/workspace-xhy \ --privileged=true --net=host \ -it vllm-ascend-xhy-pd:v1.0 bash

🐛 Describe the bug

WARNING 05-08 09:54:12connector.py:342][rank71:Failedto,receive all:KVs and hidden states,redo model forwarding

The warning indicates that the p node failed to receive the kv cache and hidden states from the d node, resulting in an error code LLM_KV_CACHE_NOT_EXIST. Consequently, the p node had to redo model forwarding. Potential causes include network issues between nodes, cache not properly initialized on the d node, misconfigurations in services or nodes, request sequence problems, or resource limitations leading to request failures.

@njuptxhy njuptxhy added the bug Something isn't working label May 19, 2025
@jianzs
Copy link
Collaborator

jianzs commented May 19, 2025

please provide details about your parallelism settings, such as the tensor/data parallel size for both prefill and decode nodes, the number of npus in use, and etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants