settings for python3 -m llama_cpp.server to test using GPU? #1146
Unanswered
silvacarl2
asked this question in
Q&A
Replies: 1 comment
-
did you try setting the main_gpu param ? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
what are the settings to test for using a GPU or more than one GPU for fastAPI? We are going to do some speed benchmarking.
these are the steps we did:
CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir
export MODEL=$HOME/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
python3 -m llama_cpp.server --n_gpu_layers -1
however, it does not seem to be taking advantage of the GPU?
Beta Was this translation helpful? Give feedback.
All reactions