Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XGrammar forcing nvcc + ninja on Linux #224

Open
zbowling opened this issue Feb 27, 2025 · 0 comments
Open

XGrammar forcing nvcc + ninja on Linux #224

zbowling opened this issue Feb 27, 2025 · 0 comments

Comments

@zbowling
Copy link
Contributor

zbowling commented Feb 27, 2025

The new CUDA kernel uses jit compile in pytorch to build the op extension but that then has outward requirements on nvcc and ninja and possibly a full toolchain installed (gcc/clang, ld, system headers). This blows up container image sizes for deployments and puts a lag on first load.

The CPU and triton kernels don't have this issue.

We could investigate compiling the kernel AOT at package time but then this creates a different set of problems like then having to deal with ABI differences with pytorch extensions and possibly blows up the build matrix of new wheel targets (not to mention getting cuda toolkit into the cibuildwheel).

For now I submitted a PR #223 as a potential solution that if the CUDA kernel fails to build for any reason it use the triton kernel instead but it's less than ideal because it suppresses the compile failure and i'm not sure if not logging is good idea to not spam folks.

I think stepping back, it might make sense to have a more formal API what to configure kernels to use that isn't an environment variable maybe?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant