Skip to content

Commit 3c1caa7

Browse files
committed
fix moe
1 parent a4feba9 commit 3c1caa7

File tree

1 file changed

+5
-3
lines changed

1 file changed

+5
-3
lines changed

vllm/model_executor/layers/fused_moe/fused_moe.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -492,12 +492,14 @@ def fused_experts(hidden_states: torch.Tensor,
492492
if tokens_in_chunk == 0:
493493
break
494494

495-
if tokens_in_chunk < CHUNK_SIZE:
496-
# will only happen in the last chunk
495+
if tokens_in_chunk < CHUNK_SIZE and chunk > 0:
496+
# Adjust the intermediate cache size and config for the last
497+
# chunk. Note that in most cases we only have one chunk
498+
# so the cache size and config are already set correctly and
499+
# do not need to be adjusted.
497500
intermediate_cache1 = intermediate_cache1[:tokens_in_chunk]
498501
intermediate_cache2 = intermediate_cache2[:tokens_in_chunk]
499502
intermediate_cache3 = intermediate_cache3[:tokens_in_chunk]
500-
# reload config to get better performance on the last chunk
501503
config = get_config_func(tokens_in_chunk)
502504

503505
curr_topk_ids = topk_ids[begin_chunk_idx:end_chunk_idx]

0 commit comments

Comments
 (0)