XGrammar forcing nvcc + ninja on Linux #224

zbowling · 2025-02-27T19:26:36Z

The new CUDA kernel uses jit compile in pytorch to build the op extension but that then has outward requirements on nvcc and ninja and possibly a full toolchain installed (gcc/clang, ld, system headers). This blows up container image sizes for deployments and puts a lag on first load.

The CPU and triton kernels don't have this issue.

We could investigate compiling the kernel AOT at package time but then this creates a different set of problems like then having to deal with ABI differences with pytorch extensions and possibly blows up the build matrix of new wheel targets (not to mention getting cuda toolkit into the cibuildwheel).

For now I submitted a PR #223 as a potential solution that if the CUDA kernel fails to build for any reason it use the triton kernel instead but it's less than ideal because it suppresses the compile failure and i'm not sure if not logging is good idea to not spam folks.

I think stepping back, it might make sense to have a more formal API what to configure kernels to use that isn't an environment variable maybe?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

XGrammar forcing nvcc + ninja on Linux #224

XGrammar forcing nvcc + ninja on Linux #224

zbowling commented Feb 27, 2025 •

edited

Loading

XGrammar forcing nvcc + ninja on Linux #224

XGrammar forcing nvcc + ninja on Linux #224

Comments

zbowling commented Feb 27, 2025 • edited Loading

zbowling commented Feb 27, 2025 •

edited

Loading