You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running on a GeForce RTX 3090 (Ampere architecture) the code stalls in the SortKernel. Possibly because certain threads no longer co-reside on the same multiprocessor during shared memory reductions in the SummarizationKernel and/or SortKernel.
The text was updated successfully, but these errors were encountered:
I think I am encountering this problem, too, on a A6000. If I comment out the call to SortKernel then the code finishes all the iterations without stalling. This bug is a shame, because the performance seems very good.
From the paper on this project, I see we used CUDA 7.5 at the time. I checked the corresponding docs ('CUDA Compiler Driver NVCC' ) and in CUDA 7.5 nvcc defaults to compiling with
Explicitly targeting a 'virtual architecture' (e.g. explicitly passing the CUDA 7.5 defaults) would fix some things. I'm curious if it would resolve the stalling, unfortunately don't have access to an Ampere GPU any more. But even if it resolves the stalling, breaking changes between CUDA 7.5 and recent versions need to be addressed as well.
Edit: I see support for Compute Capability 2.0 was dropped starting with CUDA 9.0.
Running on a GeForce RTX 3090 (Ampere architecture) the code stalls in the
SortKernel
. Possibly because certain threads no longer co-reside on the same multiprocessor during shared memory reductions in theSummarizationKernel
and/orSortKernel
.The text was updated successfully, but these errors were encountered: