You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The new CUDA kernel uses jit compile in pytorch to build the op extension but that then has outward requirements on nvcc and ninja and possibly a full toolchain installed (gcc/clang, ld, system headers). This blows up container image sizes for deployments and puts a lag on first load.
The CPU and triton kernels don't have this issue.
We could investigate compiling the kernel AOT at package time but then this creates a different set of problems like then having to deal with ABI differences with pytorch extensions and possibly blows up the build matrix of new wheel targets (not to mention getting cuda toolkit into the cibuildwheel).
For now I submitted a PR #223 as a potential solution that if the CUDA kernel fails to build for any reason it use the triton kernel instead but it's less than ideal because it suppresses the compile failure and i'm not sure if not logging is good idea to not spam folks.
I think stepping back, it might make sense to have a more formal API what to configure kernels to use that isn't an environment variable maybe?
The text was updated successfully, but these errors were encountered:
The new CUDA kernel uses jit compile in pytorch to build the op extension but that then has outward requirements on nvcc and ninja and possibly a full toolchain installed (gcc/clang, ld, system headers). This blows up container image sizes for deployments and puts a lag on first load.
The CPU and triton kernels don't have this issue.
We could investigate compiling the kernel AOT at package time but then this creates a different set of problems like then having to deal with ABI differences with pytorch extensions and possibly blows up the build matrix of new wheel targets (not to mention getting cuda toolkit into the cibuildwheel).
For now I submitted a PR #223 as a potential solution that if the CUDA kernel fails to build for any reason it use the triton kernel instead but it's less than ideal because it suppresses the compile failure and i'm not sure if not logging is good idea to not spam folks.
I think stepping back, it might make sense to have a more formal API what to configure kernels to use that isn't an environment variable maybe?
The text was updated successfully, but these errors were encountered: