Skip to content

Commit 4b7b2fc

Browse files
tjtanaagshtras
authored andcommitted
[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 (vllm-project#18205)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
1 parent 393c960 commit 4b7b2fc

File tree

1 file changed

+0
-1
lines changed
  • vllm/model_executor/layers/fused_moe

1 file changed

+0
-1
lines changed

vllm/model_executor/layers/fused_moe/layer.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,6 @@ def forward_cuda(
503503
indices_type=torch.uint32 if self.moe.use_pplx_kernels else None)
504504

505505
if self.rocm_aiter_moe_enabled:
506-
assert not apply_router_weight_on_input
507506
assert expert_map is None
508507
return self.rocm_aiter_fused_experts(
509508
hidden_states=x,

0 commit comments

Comments
 (0)