-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama-cpp-python backend - treats request as sent 90 times? #34
Comments
Hi, can you share more detail about how you make the request so i can try to reproduce the issue if i can. Are you using openai library pack or by other means e.g. request, curl etc |
yes openai python sdk, the most relevant underlying code is here: https://github.com/ShipBit/wingman-ai/blob/main/providers/open_ai.py I'm also going to try myself again later today just in case this was some random blip the first time running gallama. Edit yeah this seems to continue. I'm also getting an error (which seemingly doesn't prevent the server from starting or model loading)at the beginning that says module not found: torch, for the app.py in gallama. |
May I ask are you using Mac OS or Linux or Windows? I recalled having similar error on Macbook but I thought that I already fixed. Do try to install the latest version via I am travelling these few days so will be a bit hard to response to you promptly, but i will try to work on this as much as I can. If u have any further bug/ issue, do share the log as it will be helpful for me to try to reproduce it |
Thanks! No rush! I'm on windows 11. I just did the pip install gallama last night so it should be the latest. but I'll run that command just in case. Nothing is contained in my logs folder. -EDIT- yes I was already on the most recent version, confirmed. |
Thanks for understanding. Did you install this natively on windows or inside WSL on windows? |
Regular windows 11 natively. |
Hi, thanks again for this project.
I finally got to test with the llama cpp backend.
I used the built in model downloader feature to download the mistral 4.0 gguf model
I pip installed gallama
When I pinged the server with a function calling request, it seems to have fed the request in 90+ times? (I know the request itself works because I've used with other local backends recently and with openai/azure).
Any logs I could share?
That just repeats over and over again increasing the int from 0-100 and then stops processing.
The text was updated successfully, but these errors were encountered: