-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
[Bug]: Triton Error in multiproc_executor.py
when running llama4 on ROCm
#18088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Triton 3.3 is ok. Check this triton in this nightly image: rocm/vllm-dev:nightly_main_20250512
|
actually, it is a padding issue . #16828. |
cc @tdoublep |
meta-llama/Llama-4-Scout-17B-16E-Instruct -tp 4 --max-model-len 32768 --max_seq_len_to_capture 32768 --no-enable-prefix-caching --max-num-batched-tokens 32768
The best solution is to fallback The correctness of all three approaches have been validated by running lm_eval on GSM8K on both Llama4 and Mixtral model. |
Resolved by #18093 |
Your current environment
The output of
python collect_env.py
🐛 Describe the bug
When running the following script
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: