-
Notifications
You must be signed in to change notification settings - Fork 44
Doc mask returns negative sparsity #93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This should not have any effect on the result. This is an independent code path from what is used to run w/ flex attention. I noticed this as well and I know the root cause. I am working on the fix here Cc @Chillee |
When running the above code, I encountered the following error. My PyTorch version is 2.5.1+cu121. Does FlexAttention support Q, K, V with BLOCK_SIZE % 128 ≠ 0? Any help or insights would be greatly appreciated.
|
Can you try upgrading to a newer version of PyTorch we have fixed many bugs since 2.5.1 |
The displayed sparsity for the block mask is negative when the block size is bigger than the max size of the mask.
Similar to this issue #68
torch nightly : 2.6.0.dev20241221+cu124
repro code :
output :
does this have an effect on the results obtained with such a mask?
The text was updated successfully, but these errors were encountered: