You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the proposal
List the GPU kernels with changed register usage as a comment in each PR.
This can done by using the --ptxas-options=v compiler flag, then parsing the compiler output with sed or another text parser, e.g.:
ptxas info : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]
We can parse the above into a *.csv file that consists of two columns: kernel name and register usage:
searchkernel 46
and then diff it against a reference file computed using the current development branch.
Describe alternatives you've considered
Profile GPU performance directly for each PR. This is tricky to do on a GPU kernel-by-kernel basis.
Additional context
GPU performance for our code is exquisitely sensitive to register usage. Conversely, register pressure is a good predictor of performance for our code. This metric should tell us whether there are major performance regressions on GPU.
Describe the proposal
List the GPU kernels with changed register usage as a comment in each PR.
This can done by using the
--ptxas-options=v
compiler flag, then parsing the compiler output withsed
or another text parser, e.g.:We can parse the above into a *.csv file that consists of two columns: kernel name and register usage:
and then diff it against a reference file computed using the current development branch.
Describe alternatives you've considered
Profile GPU performance directly for each PR. This is tricky to do on a GPU kernel-by-kernel basis.
Additional context
GPU performance for our code is exquisitely sensitive to register usage. Conversely, register pressure is a good predictor of performance for our code. This metric should tell us whether there are major performance regressions on GPU.
See also: https://stackoverflow.com/questions/12388207/interpreting-the-verbose-output-of-ptxas-part-i
The text was updated successfully, but these errors were encountered: