-
-
Notifications
You must be signed in to change notification settings - Fork 7.7k
[Bug]: LLMEngine cannot be pickled error vllm 0.6.1.post2 #8778
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
it seems this a cuda oom issue that is being thrown as a |
This is probably the same thing that #8744 should help with. |
same issue with "mistralai/Mistral-7B-Instruct-v0.1" |
Same here, even though I modified the max num seqs to only 1. Are there any further solutions? (Issue with LLaMa 3.1 Instruct) |
Same issue with Llama3.1 405B |
It looks like not a pickle issue, the root cause should be:
Which might be caused by low level invalid memory access, or accumulated memory fragmentation, suggest to reduce the --gpu_memory_utiliztion to see if can mitigate the issue. |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
The output of `python collect_env.py`
Model Input Dumps
model="Qwen/Qwen2.5-72B-Instruct"
guided_decoding_backend="outlines"
vllm_command_flags={
"--gpu-memory-utilization": 0.99,
"--max-num-seqs": 64,
}
max_model_len=8192,
library_overrides={
"vllm": "vllm==0.6.1.post2",
}
🐛 Describe the bug
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: