You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I'm sorry for omitting some env details here, because we found this bug in our private server environment.
In short summary, in vllm-ascend, we use additional config to enable v0 style scheduler for better performance. However, when we install the newest code d066e52013be278c7a3bc54ec9799d8457895f4d of vllm and 218f21d..68fb634 of vllm-ascend, we encountered errors when dealing with requests such as
Runtime Error: object of type KVCacheBlocks has no len()
What happened?
The root cause of this problem is that, recently, the vllm project has rewritten the following methods of KVCacheManager (details can be found at this PR):
Introduce KVCacheBlocks
get_computed_blocks method returns tuple[KVCacheBlocks, int] instead of Tuple[List[BlockHashType], int]
allocate_slots has one extra arg named num_new_computed_tokens and returns Optional[KVCacheBlocks]
The text was updated successfully, but these errors were encountered:
Your current environment
The output of `python collect_env.py`
🐛 Describe the bug
First, I'm sorry for omitting some env details here, because we found this bug in our private server environment.
In short summary, in vllm-ascend, we use additional config to enable v0 style scheduler for better performance. However, when we install the newest code
d066e52013be278c7a3bc54ec9799d8457895f4d
of vllm and 218f21d..68fb634 of vllm-ascend, we encountered errors when dealing with requests such asWhat happened?
The root cause of this problem is that, recently, the vllm project has rewritten the following methods of KVCacheManager (details can be found at this PR):
KVCacheBlocks
tuple[KVCacheBlocks, int]
instead ofTuple[List[BlockHashType], int]
num_new_computed_tokens
and returnsOptional[KVCacheBlocks]
The text was updated successfully, but these errors were encountered: