Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible problem in zero output #279

Open
chu-tianxiang opened this issue Jan 16, 2024 · 2 comments
Open

Possible problem in zero output #279

chu-tianxiang opened this issue Jan 16, 2024 · 2 comments

Comments

@chu-tianxiang
Copy link

Currently, blocks where blockIdx.z = 0 will set the output to zero here. However, due to the absence of synchronization between blocks, is it possible that other blocks
with different blockIdx.z might complete their calculations and update the output before it's zeroed out?

@turboderp
Copy link
Member

Conventional wisdom would say yes, it's possible for blocks to launch in any order. In practice (and I tested this a lot) I've never seen block (x, y, 0) launch after (x, y, 1). There's some discussion of it here and some tests of the correlation between launch time and blockIdx, which turns out to be very strong.

Of course, relying on it is still a little hacky since there are no actual guarantees from NVIDIA. I definitely want to find a better solution, and I'm trying to rework the kernel to do FP32 accumulation anyway, which may make it a non-issue soon.

@chu-tianxiang
Copy link
Author

Actually the concern is, modern GPU have enough SMs that block (x, y, 0) and (x, y, 1) are launched concurrently, it's hard to guarantee block 0 will finish the zero-fill first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants