Possible problem in zero output #279

chu-tianxiang · 2024-01-16T13:28:02Z

Currently, blocks where blockIdx.z = 0 will set the output to zero here. However, due to the absence of synchronization between blocks, is it possible that other blocks
with different blockIdx.z might complete their calculations and update the output before it's zeroed out？

The text was updated successfully, but these errors were encountered:

turboderp · 2024-01-17T04:47:44Z

Conventional wisdom would say yes, it's possible for blocks to launch in any order. In practice (and I tested this a lot) I've never seen block (x, y, 0) launch after (x, y, 1). There's some discussion of it here and some tests of the correlation between launch time and blockIdx, which turns out to be very strong.

Of course, relying on it is still a little hacky since there are no actual guarantees from NVIDIA. I definitely want to find a better solution, and I'm trying to rework the kernel to do FP32 accumulation anyway, which may make it a non-issue soon.

chu-tianxiang · 2024-01-17T07:57:44Z

Actually the concern is, modern GPU have enough SMs that block (x, y, 0) and (x, y, 1) are launched concurrently, it's hard to guarantee block 0 will finish the zero-fill first.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible problem in zero output #279

Possible problem in zero output #279

chu-tianxiang commented Jan 16, 2024

turboderp commented Jan 17, 2024

chu-tianxiang commented Jan 17, 2024

Possible problem in zero output #279

Possible problem in zero output #279

Comments

chu-tianxiang commented Jan 16, 2024

turboderp commented Jan 17, 2024

chu-tianxiang commented Jan 17, 2024