Skip to content

Commit e38042d

Browse files
authored
[Kernel] Disable CUTLASS kernels for fp8 (#5505)
1 parent 33e3b37 commit e38042d

File tree

1 file changed

+3
-1
lines changed
  • vllm/model_executor/layers/quantization

1 file changed

+3
-1
lines changed

vllm/model_executor/layers/quantization/fp8.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,9 @@ def apply(self,
257257
# If dynamic, layer.input_scale is None and x_scale computed from x.
258258
# If static, layer.input_scale is scalar and x_scale is input_scale.
259259

260-
if bias is None and self.cutlass_fp8_supported:
260+
# Temporarily disable CUTLASS kernels due to an illegal memory access
261+
#if bias is None and self.cutlass_fp8_supported:
262+
if False:
261263
qinput, x_scale = ops.scaled_fp8_quant(x, layer.input_scale)
262264

263265
# Fused GEMM_DQ

0 commit comments

Comments
 (0)