Skip to content

Commit 5998704

Browse files
authored
[BugFix] Fix ascend scheduler bugs. (vllm-project#822)
This PR fixes two bugs in AscendScheduler: 1. When running with high concurrency, the length of running queue may exceed the limit of max_num_seqs 2. When some requests are prempted and recomputing is activated, the logic of computing new tokens is wrong. Signed-off-by: whx-sjtu <2952154980@qq.com>
1 parent 701b0fd commit 5998704

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

vllm_ascend/core/scheduler.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ def schedule(self) -> SchedulerOutput:
7474

7575
# Schedule prefill requests first.
7676
while self.waiting and token_budget > 0:
77-
if len(scheduled_new_reqs) == self.max_num_running_reqs:
77+
if len(self.running) == self.max_num_running_reqs:
7878
break
7979

8080
request = self.waiting[0]
@@ -96,7 +96,7 @@ def skip_cur_request():
9696
# Get already-cached tokens.
9797
computed_blocks, num_computed_tokens = (
9898
self.kv_cache_manager.get_computed_blocks(request))
99-
num_new_tokens = request.num_prompt_tokens - num_computed_tokens
99+
num_new_tokens = request.num_tokens - num_computed_tokens
100100
if (0 < self.scheduler_config.long_prefill_token_threshold <
101101
num_new_tokens):
102102
num_new_tokens = (

0 commit comments

Comments
 (0)