cuda graph utilization #4356
-
May I ask how cuda graphs are used in AMReX? |
Beta Was this translation helpful? Give feedback.
Answered by
WeiqunZhang
Mar 11, 2025
Replies: 1 comment
-
In our old communication functions, there were a lot of smaller kernels. So we used cudaGraph to reduce the kernel launch overhead. But later, we found that manually fusing the small kernels was faster than cudaGraph for our cases. So we no longer use cudaGraph in communication unless one forces it by setting cudaGraph region. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
indra098124
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In our old communication functions, there were a lot of smaller kernels. So we used cudaGraph to reduce the kernel launch overhead. But later, we found that manually fusing the small kernels was faster than cudaGraph for our cases. So we no longer use cudaGraph in communication unless one forces it by setting cudaGraph region.