Skip to content

Commit 8648b1e

Browse files
committed
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support.
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
1 parent e66faf4 commit 8648b1e

21 files changed

+2439
-47
lines changed

CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -189,6 +189,7 @@ set(VLLM_EXT_SRC
189189
"csrc/cache_kernels.cu"
190190
"csrc/attention/paged_attention_v1.cu"
191191
"csrc/attention/paged_attention_v2.cu"
192+
"csrc/attention/vertical_slash_index.cu"
192193
"csrc/pos_encoding_kernels.cu"
193194
"csrc/activation_kernels.cu"
194195
"csrc/layernorm_kernels.cu"
@@ -550,7 +551,7 @@ else()
550551
FetchContent_Declare(
551552
vllm-flash-attn
552553
GIT_REPOSITORY https://github.com/vllm-project/flash-attention.git
553-
GIT_TAG 96266b1111111f3d11aabefaf3bacbab6a89d03c
554+
GIT_TAG 323b789ae92b0376c0adfcfe1b48571ac32dc411
554555
GIT_PROGRESS TRUE
555556
# Don't share the vllm-flash-attn build between build types
556557
BINARY_DIR ${CMAKE_BINARY_DIR}/vllm-flash-attn

0 commit comments

Comments
 (0)