-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Try to run DeepSeek-Coder-V2-Lite with 16G GPU memory and get Out of memory error #4156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @Lambda14, I have verified that the DeepSeek-Coder-V2-Lite possesses 16B parameters. Consequently, 16GB of memory is adequate, likely leading to Out-Of-Memory errors. This seems to be working as expected, maybe you should use a model with fewer parameters. |
@zwpaper Hello, thanks for the reply, but still, why this error does not occur when running the model via docker? |
Ok, now i tried to run model Qwen2.5-Coder-3B without chat model
here is my nvidia-smi output, with launched tabby |
MUL_MAT failed is likely due to an issue in the upstream llama.cpp: ggml-org/llama.cpp#13252 |
Hi @Lambda14, What is the model of your CPU? We have encountered a failure that was caused by a CPU lacking support for certain AVX instructions. |
Hello, I'm trying to run tabby with model DeepSeek-Coder-V2-Lite on windows using the command: .\tabby.exe serve --model DeepSeek-Coder-V2-Lite --chat-model Qwen2-1.5B-Instruct --device cuda
and I get memory allocation error: allocating 15712.47 MiB on device 0: cudaMalloc failed: out of memory
this is happening on a server with a tesla p100 video card.
however on another computer with RTX 3070 using docker the model works but very slow
Why is this happening?
The text was updated successfully, but these errors were encountered: