Skip to content

cuda graph utilization #4356

Closed Answered by WeiqunZhang
indra098124 asked this question in Q&A
Discussion options

You must be logged in to vote

In our old communication functions, there were a lot of smaller kernels. So we used cudaGraph to reduce the kernel launch overhead. But later, we found that manually fusing the small kernels was faster than cudaGraph for our cases. So we no longer use cudaGraph in communication unless one forces it by setting cudaGraph region.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by indra098124
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants